CN110855993A

CN110855993A - Method and device for predicting motion information of image block

Info

Publication number: CN110855993A
Application number: CN201811015602.9A
Authority: CN
Inventors: 杨海涛; 赵寅; 徐巍炜
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2018-08-21
Filing date: 2018-08-31
Publication date: 2020-02-28

Abstract

The application discloses a method and a device for predicting motion information of an image block, wherein the method comprises the following steps: determining at least one target pixel point having a preset position relation with an image block to be processed, wherein the target pixel point is adjacent to a straight line where the upper edge of an encoding tree unit where the image block to be processed is located or a straight line where the left edge of the encoding tree unit is located, and the target pixel point is located outside the encoding tree unit; adding the motion information corresponding to the at least one target pixel point into a set of candidate motion information of the image block to be processed; and determining target motion information from the candidate motion information set, wherein the target motion information is used for predicting the motion information of the image block to be processed.

Description

Method and device for predicting motion information of image block

Technical Field

The present application relates to the field of video image technologies, and in particular, to a method and an apparatus for predicting motion information of an image block.

Background

Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, Personal Digital Assistants (PDAs), laptop or desktop computers, tablet computers, e-book readers, digital cameras, digital recording devices, digital media players, video gaming devices, video game consoles, cellular or satellite radio telephones, video conferencing devices, video streaming devices, and so forth. Digital video devices implement video compression techniques such as those described in the standards and extensions of the standards defined by the MPEG-2, MPEG-4, ITU-t h.263, ITU-t h.264/MPEG-4 part 10 Advanced Video Coding (AVC), ITU-t h.265 High Efficiency Video Coding (HEVC) standards, to more efficiently transmit and receive digital video information. Video devices may more efficiently transmit, receive, encode, decode, and/or store digital video information by implementing these video codec techniques.

Video compression techniques perform spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundancy inherent in video sequences. For block-based video decoding, a video block may be partitioned into video blocks, which may also be referred to as treeblocks, Coding Units (CUs), and/or decoding nodes. Video blocks in an intra-decoded (I) slice of a picture are encoded using spatial prediction with respect to reference samples in neighboring blocks in the same picture. Video blocks in an inter-decoded (P or B) slice of a picture may use spatial prediction with respect to reference samples in neighboring blocks in the same picture or temporal prediction with respect to reference samples in other reference pictures. A picture may be referred to as a frame and a reference picture may be referred to as a reference frame.

Disclosure of Invention

The embodiment of the application provides a method and a device for predicting motion information of an image block, and suitable candidate motion information is selected as a motion information prediction value of the image block to be processed, so that the effectiveness of motion information prediction is improved, and the coding and decoding efficiency is improved.

It should be understood that, in general, the motion information includes a motion vector and index information of a reference frame to which the motion vector points, and the like. In one possible implementation of the embodiments of the present application, the prediction of motion information refers to prediction of a motion vector.

In a first aspect of an embodiment of the present application, a method for predicting motion information of an image block is provided, including: determining at least one target pixel point having a preset position relation with an image block to be processed, wherein the target pixel point is adjacent to a straight line on which the upper edge and/or the left edge of an encoding tree unit where the image block to be processed is located, and the target pixel point is located outside the encoding tree unit; adding the motion information corresponding to the at least one target pixel point into a set of candidate motion information of the image block to be processed; and determining target motion information from the candidate motion information set, wherein the target motion information is used for predicting the motion information of the image block to be processed.

Compared with the prior art, the method selects the motion information from the upper side adjacent block and/or the left side adjacent block of the coding tree unit CTU as the candidate prediction motion information, and the accessed spatial motion information is less and the complexity is lower.

Wherein, the coordinate of the upper left corner of the coding tree unit is recorded as PM ═ x_M,y_M) The width of the coding tree unit is W, the height is H, wherein x_MAs a horizontal coordinate, y_MThe upper edge of the coding tree unit is indicated as including the coordinate PG ═ x (where is the vertical coordinate)_G,y_G) A region of (1), wherein x_M≤x_G≤x_M+W-1，y_G＝y_M(ii) a The left edge of the coding tree unit includes the coordinates PH ═ x_H,y_H) A region of (a) wherein y_M≤y_H≤y_M+H-1，x_G＝x_M。

The target pixel point is adjacent to the straight line of the upper edge of the coding tree unit where the image block to be processed is located, namely the difference between the vertical coordinate of the target pixel point and the vertical coordinate of the upper edge of the coding tree unit where the image block to be processed is located is smaller than or equal to a preset first threshold; the target pixel point is adjacent to the straight line where the left edge of the coding tree unit where the image block to be processed is located, that is, the difference between the horizontal coordinate of the target pixel point and the horizontal coordinate of the left edge of the coding tree unit where the image block to be processed is located is smaller than or equal to a preset second threshold.

The motion information corresponding to the at least one target pixel point may include motion information corresponding to each target pixel point in the at least one target pixel point.

In a first possible implementation manner of the first aspect, the target pixel point includes a first pixel point, where a straight line where the first pixel point is located is parallel to a straight line where a left edge of the coding tree unit is located, the first pixel point is located on a line segment with a first foot and a second foot as end points, the first foot is a vertical projection point of a pixel point at an upper left corner of the to-be-processed image block on the straight line where the first pixel point is located, and the second foot is a vertical projection point of a pixel point at a lower left corner of the to-be-processed image block on the straight line where the first pixel point is located.

When the at least one target pixel point is a plurality of target pixel points, the plurality of target pixel points may include a plurality of first pixel points.

In a second possible implementation manner of the first aspect, the target pixel further includes a second pixel, where a straight line where the second pixel is located is parallel to a straight line where an upper edge of the coding tree unit is located, the second pixel is located on a line segment with a third foot and a fourth foot as end points, the third foot is a vertical projection point of a pixel point at an upper left corner of the to-be-processed image block on the straight line where the second pixel is located, and the fourth foot is a vertical projection point of a pixel point at an upper right corner of the to-be-processed image block on the straight line where the second pixel is located.

When the at least one target pixel point is a plurality of target pixel points, the plurality of target pixel points may include one or more second pixel points.

Optionally, the target pixels may include one or more first pixels and one or more second pixels.

In a third possible implementation manner of the first aspect, when a pixel point in the top left corner of the to-be-processed image block is located in the top right half of the coding tree unit, a codeword length used for representing motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to a codeword length used for representing motion information corresponding to the first pixel point in the set of candidate motion information; when the pixel point at the upper left corner of the image block to be processed is located in the lower left half of the coding tree unit, the length of the codeword used for representing the motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to the length of the codeword used for representing the motion information corresponding to the second pixel point in the set of candidate motion information.

When the pixel point at the upper left corner of the image block to be processed is located at the upper right half of the coding tree unit, because the distance between the image block to be processed and the second pixel point is smaller than the distance between the image block to be processed and the first pixel point, the correlation is higher, and the selected probability is higher, the length of the codeword used for representing the motion information corresponding to the second pixel point in the set of candidate motion information is smaller than or equal to the length of the codeword used for representing the motion information corresponding to the first pixel point in the set of candidate motion information, and the coding efficiency can be improved. Based on a similar principle, when a pixel point at the upper left corner of the to-be-processed image block is located in the lower left half of the coding tree unit, the length of a codeword used for representing the motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to the length of a codeword used for representing the motion information corresponding to the second pixel point in the set of candidate motion information, so that the coding efficiency can also be improved.

In a fourth possible implementation manner of the first aspect, the target pixel further includes a third pixel, where when a straight line where the third pixel is located is parallel to a straight line where a left edge of the coding tree unit is located, the third pixel is located on a line segment with the first foot and a reference intersection point as end points, and the reference intersection point is an intersection point of the straight line where the first pixel is located and the straight line where the second pixel is located; and when the straight line where the third pixel point is located is parallel to the straight line where the upper edge of the coding tree unit is located, the third pixel point is located on a line segment which takes the third vertical foot and the reference intersection point as end points.

When the at least one target pixel point is a plurality of target pixel points, the plurality of target pixel points may include one or more third pixel points.

Optionally, the plurality of target pixel points may include one or more first pixel points and one or more third pixel points; or, the plurality of target pixel points may include a plurality of second pixel points and a plurality of third pixel points, or the plurality of target pixel points may include one or more first pixel points, one or more second pixel points, and one or more third pixel points.

In a fifth possible implementation manner of the first aspect, a length of a codeword used to represent motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to a length of a codeword used to represent motion information corresponding to the third pixel point in the set of candidate motion information, and a length of a codeword used to represent motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to a length of a codeword used to represent motion information corresponding to the third pixel point in the set of candidate motion information.

In this embodiment, the code word length of the motion information corresponding to the first pixel point, the second pixel point, and the third pixel point is limited, so that the encoding efficiency can be improved.

In a sixth possible implementation manner of the first aspect, when a pixel point at an upper left corner of the to-be-processed image block is located in an upper right half of the coding tree unit, a straight line where the third pixel point is located is parallel to a straight line where an upper edge of the coding tree unit is located; when the pixel point at the upper left corner of the image block to be processed is located at the lower left half part of the coding tree unit, the straight line where the third pixel point is located is parallel to the straight line where the left edge of the coding tree unit is located.

When the pixel point at the upper left corner of the image block to be processed is located at the upper right half part of the coding tree unit, the straight line where the third pixel point is located is parallel to the straight line where the upper edge of the coding tree unit is located, so that the distance between the third pixel point and the pixel point in the image block to be processed is smaller. By taking the third pixel point as a target pixel point, the correlation between the motion information corresponding to the third pixel point and the motion information corresponding to the pixel point in the image block to be processed can be higher.

Based on a similar principle, when the pixel point at the upper left corner of the image block to be processed is located at the lower left half part of the coding tree unit, the straight line where the third pixel point is located is parallel to the straight line where the left edge of the coding tree unit is located, so that the correlation between the motion information corresponding to the third pixel point and the motion information corresponding to the pixel point in the image block to be processed is higher.

In a seventh possible implementation manner of the first aspect, the target pixel further includes a fourth pixel, where a straight line where the fourth pixel is located is parallel to a straight line where a left edge of the coding tree unit is located, and the fourth pixel is located on a ray that uses the second foot as an end point and that uses a direction from the first foot to the second foot as a direction.

In an eighth feasible implementation manner of the first aspect, the target pixel further includes a fifth pixel, where a straight line where the fifth pixel is located is parallel to a straight line where an upper edge of the coding tree unit is located, and the fifth pixel is located on a ray that uses the fourth foot as an end point and that uses a direction from the third foot to the fourth foot as a direction.

In a ninth possible implementation manner of the first aspect, when a pixel point in the top left corner of the to-be-processed image block is located in the top right half of the coding tree unit, a codeword length used for representing motion information corresponding to the fifth pixel point in the set of candidate motion information is less than or equal to a codeword length used for representing motion information corresponding to the fourth pixel point in the set of candidate motion information; when the pixel point at the upper left corner of the image block to be processed is located in the lower left half of the coding tree unit, the length of the codeword used for representing the motion information corresponding to the fourth pixel point in the set of candidate motion information is less than or equal to the length of the codeword used for representing the motion information corresponding to the fifth pixel point in the set of candidate motion information.

When the coding tree unit is divided into two regions, the left region is called the left half, or the left upper half, and the right region is called the right half, or the right lower half. Wherein optionally the width of the upper left half and the width of the upper right half tend to be equal to half the width of the coding tree unit as much as possible. In case the width of the coding tree unit can be divided equally into two, the width of the upper left half and the width of the upper right half are equal to half the width of the coding tree unit.

When the pixel point at the upper left corner of the image block to be processed is located at the upper right half of the coding tree unit, because the distance between the image block to be processed and the fifth pixel point is smaller than the distance between the image block to be processed and the fourth pixel point, the correlation is higher, and the selected probability is higher, the length of the codeword used for representing the motion information corresponding to the fifth pixel point in the set of candidate motion information is smaller than or equal to the length of the codeword used for representing the motion information corresponding to the fourth pixel point in the set of candidate motion information, and the coding efficiency can be improved. Based on a similar principle, when a pixel point at the upper left corner of the to-be-processed image block is located in the lower left half of the coding tree unit, the length of a codeword used for representing the motion information corresponding to the fourth pixel point in the set of candidate motion information is less than or equal to the length of a codeword used for representing the motion information corresponding to the fifth pixel point in the set of candidate motion information, so that the coding efficiency can also be improved.

In a tenth possible implementation manner of the first aspect, when a pixel point at the top left corner of the to-be-processed image block is located in the top left quarter of the coding tree unit, a codeword length used for representing, in the set of candidate motion information, motion information corresponding to the first pixel point is less than or equal to a codeword length used for representing, in the set of candidate motion information, motion information corresponding to the second pixel point, a codeword length used for representing, in the set of candidate motion information, motion information corresponding to the second pixel point is less than or equal to a codeword length used for representing, in the set of candidate motion information, motion information corresponding to the third pixel point, and a codeword length used for representing, in the set of candidate motion information, motion information corresponding to the third pixel point is less than or equal to a codeword length used for representing, in the set of candidate motion information, motion information corresponding to the fifth pixel point And a codeword length for representing motion information corresponding to the fifth pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the fourth pixel point in the set of candidate motion information; when a pixel point at the upper left corner of the image block to be processed is located in the upper right quarter of the coding tree unit, a codeword length for representing motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the fifth pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the fifth pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the first pixel point in the set of candidate motion information, and the length of the codeword used to represent the motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to the length of the codeword used to represent the motion information corresponding to the fourth pixel point in the set of candidate motion information; when a pixel point at the upper left corner of the image block to be processed is located in the lower left quarter of the coding tree unit, a codeword length for representing motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the fourth pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the fourth pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the second pixel point in the set of candidate motion information, and the length of the codeword used to represent the motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to the length of the codeword used to represent the motion information corresponding to the fifth pixel point in the set of candidate motion information; when a pixel point at the upper left corner of the image block to be processed is located in the lower right quarter of the coding tree unit, a codeword length for representing motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the second pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the fifth pixel point in the set of candidate motion information, and the length of the codeword used to represent the motion information corresponding to the fifth pixel point in the set of candidate motion information is less than or equal to the length of the codeword used to represent the motion information corresponding to the fourth pixel point in the set of candidate motion information.

The coding tree unit is divided into four regions of a grid pattern of "field", the upper left region being called the upper left quarter, the lower left region being called the lower left quarter, the upper right region being called the upper right quarter, and the lower right region being called the lower right quarter.

Wherein optionally the width of each of the four regions may tend to be as large as half the width of the coding tree unit. In case the width of the coding tree unit can be divided in two, the width of each of these four regions is equal to half the width of the coding tree unit.

Wherein optionally the height of each of the four regions may tend to be equal to half the height of the coding tree unit as much as possible. In case the height of the coding tree unit can be divided in two, the height of each of these four regions is equal to half the height of the coding tree unit.

When the pixel point at the upper left corner of the image block to be processed is located at the upper left quarter of the coding tree unit, because the distance between the image block to be processed and the first pixel point is smaller than the distance between the image block to be processed and the second pixel point, the correlation is higher, and the selected probability is higher, the length of the code word used for representing the motion information corresponding to the first pixel point in the set of candidate motion information is smaller than or equal to the length of the code word used for representing the motion information corresponding to the second pixel point in the set of candidate motion information, and the coding efficiency can be improved.

Based on a similar principle, in this embodiment, the encoding efficiency may be improved by limiting the codeword length used to represent the motion information corresponding to the second pixel point, the third pixel point, the fourth pixel point, and the fifth pixel point in the set of candidate run information.

Based on a similar principle, in this embodiment, when the pixel point at the upper left corner of the to-be-processed image block is located in another part of the coding tree unit, the coding efficiency may be improved by limiting the codeword length used to represent the motion information corresponding to the first pixel point, the second pixel point, the third pixel point, the fourth pixel point, and the fifth pixel point in the set of candidate operation information.

In an eleventh possible implementation manner of the first aspect, a horizontal direction to the right is a positive direction of a horizontal axis of a rectangular coordinate system, a vertical direction to the down is a positive direction of a vertical axis of the rectangular coordinate system, coordinates of a pixel point at an upper left corner of the to-be-processed image block are (x0, y0), and coordinates of a pixel point at an upper left corner of the code tree unit are (x1, y1), where the coordinates of the first pixel point include: (x1-1, y0+ H0-1), or (x1-1, y0+ H0/2), or (x1-1, y0), wherein H0 is the height of the image block to be processed.

In a twelfth possible implementation manner of the first aspect, the coordinates of the second pixel point include: (x0+ W0-1, y1-1), or (x0+ W0/2, y1-1), or (x0, y1-1), wherein W0 is the width of the image block to be processed.

In a thirteenth possible implementation manner of the first aspect, the coordinates of the third pixel point include: (x0-y0+ y1-1, y1-1) or (x1-1, y0-x0+ x 1-1).

In a fourteenth possible implementation manner of the first aspect, when (x0-x1) > (y0-y1), the coordinates of the third pixel point include (x0-y0+ y1-1, y 1-1); when (x0-x1) < (y0-y1), the coordinates of the third pixel point include (x1-1, y0-x0+ x 1-1).

In a fifteenth possible implementation manner of the first aspect, the coordinates of the fourth pixel point include: (x1-1, y0+ H0+ x0-x1), or (x1-1, y0+ H0+ (x0-x1)/2), or (x1-1, min (y0+ H0+ x0-x1, y1+ H1-1)), wherein H1 is the height of the coding tree unit.

In a sixteenth possible implementation manner of the first aspect, the coordinates of the fifth pixel point include: (x0+ W0+ y0-y1, y1-1), or (x0+ W0+ (y0-y1)/2, y1-1), or (min (x0+ W0+ y0-y1, x1+ W1 x 3/2), y1-1), or (x0+ W0+ y0-y1, x1+ W1-1), y1-1), wherein W1 is the width of the coding tree unit.

In a seventeenth possible implementation manner of the first aspect, before adding the motion information corresponding to the at least one target pixel point into the set of candidate motion information of the image block to be processed, the method further includes: and adding the motion information of the spatial adjacent block and/or the time domain related position block at the preset position of the image block to be processed into the candidate motion information set of the image block to be processed.

In an eighteenth possible implementation manner of the first aspect, the method is configured to decode the image block to be processed, and the determining target motion information from the set of candidate motion information includes: analyzing the code stream to obtain identification information; and determining the target motion information according to the identification information.

In a nineteenth possible implementation manner of the first aspect, the method is configured to encode the image block to be processed, and the determining target motion information from the set of candidate motion information includes: and selecting the motion information with the minimum coding cost from the candidate motion information set as the target motion information.

In a twenty-second possible implementation manner of the first aspect, after the selecting, as the target motion information, the motion information with the smallest coding cost from the set of candidate motion information, the method further includes: and encoding the identification information of the target motion information.

In a second aspect of the embodiments of the present application, there is provided an apparatus for predicting motion information of an image block, including: the image processing device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for determining at least one target pixel point which has a preset position relation with an image block to be processed, the target pixel point is adjacent to a straight line where the upper edge or the left edge of an encoding tree unit where the image block to be processed is located, and the target pixel point is located outside the encoding tree unit; the list module is used for adding the motion information corresponding to the at least one target pixel point into a set of candidate motion information of the image block to be processed; and the index module is used for determining target motion information from the candidate motion information set, wherein the target motion information is used for predicting the motion information of the image block to be processed.

In a first possible implementation manner of the second aspect, the target pixel point includes a first pixel point, where a straight line where the first pixel point is located is parallel to a straight line where a left edge of the coding tree unit is located, the first pixel point is located on a line segment with a first foot and a second foot as end points, the first foot is a vertical projection point of a pixel point at an upper left corner of the to-be-processed image block on the straight line where the first pixel point is located, and the second foot is a vertical projection point of a pixel point at a lower left corner of the to-be-processed image block on the straight line where the first pixel point is located.

In a second possible implementation manner of the second aspect, the target pixel further includes a second pixel, where a straight line where the second pixel is located is parallel to a straight line where an upper edge of the coding tree unit is located, the second pixel is located on a line segment with a third foot and a fourth foot as end points, the third foot is a vertical projection point of a pixel point at the upper left corner of the to-be-processed image block on the straight line where the second pixel is located, and the fourth foot is a vertical projection point of a pixel point at the upper right corner of the to-be-processed image block on the straight line where the second pixel is located.

In a third possible implementation manner of the second aspect, when a pixel point at the top left corner of the to-be-processed image block is located in the top right half of the coding tree unit, a codeword length used for representing motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to a codeword length used for representing motion information corresponding to the first pixel point in the set of candidate motion information; when the pixel point at the upper left corner of the image block to be processed is located in the lower left half of the coding tree unit, the length of the codeword used for representing the motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to the length of the codeword used for representing the motion information corresponding to the second pixel point in the set of candidate motion information.

In a fourth possible implementation manner of the second aspect, the target pixel further includes a third pixel, where when a straight line where the third pixel is located is parallel to a straight line where a left edge of the coding tree unit is located, the third pixel is located on a line segment with the first foot and a reference intersection point as end points, and the reference intersection point is an intersection point of the straight line where the first pixel is located and the straight line where the second pixel is located; and when the straight line where the third pixel point is located is parallel to the straight line where the upper edge of the coding tree unit is located, the third pixel point is located on a line segment which takes the third vertical foot and the reference intersection point as end points.

In a fifth possible implementation manner of the second aspect, a length of a codeword used to represent motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to a length of a codeword used to represent motion information corresponding to the third pixel point in the set of candidate motion information, and a length of a codeword used to represent motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to a length of a codeword used to represent motion information corresponding to the third pixel point in the set of candidate motion information.

In a sixth possible implementation manner of the second aspect, when a pixel point at an upper left corner of the to-be-processed image block is located at an upper right half of the coding tree unit, a straight line where the third pixel point is located is parallel to a straight line where an upper edge of the coding tree unit is located; when the pixel point at the upper left corner of the image block to be processed is located at the lower left half part of the coding tree unit, the straight line where the third pixel point is located is parallel to the straight line where the left edge of the coding tree unit is located.

In a seventh possible implementation manner of the second aspect, the target pixel further includes a fourth pixel, where a straight line where the fourth pixel is located is parallel to a straight line where a left edge of the coding tree unit is located, and the fourth pixel is located on a ray that uses the second foot as an end point and that uses a direction from the first foot to the second foot as a direction.

In an eighth possible implementation manner of the second aspect, the target pixel further includes a fifth pixel, where a straight line where the fifth pixel is located is parallel to a straight line where an upper edge of the coding tree unit is located, and the fifth pixel is located on a ray that uses the fourth foot as an end point and that uses a direction from the third foot to the fourth foot as a direction.

In a ninth possible implementation manner of the second aspect, when a pixel point at the top left corner of the to-be-processed image block is located in the top right half of the coding tree unit, a codeword length used for representing motion information corresponding to the fifth pixel point in the set of candidate motion information is less than or equal to a codeword length used for representing motion information corresponding to the fourth pixel point in the set of candidate motion information; when the pixel point at the upper left corner of the image block to be processed is located in the lower left half of the coding tree unit, the length of the codeword used for representing the motion information corresponding to the fourth pixel point in the set of candidate motion information is less than or equal to the length of the codeword used for representing the motion information corresponding to the fifth pixel point in the set of candidate motion information.

In a tenth possible implementation manner of the second aspect, when a pixel point at the top left corner of the to-be-processed image block is located in the top left quarter of the coding tree unit, a codeword length for representing motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the second pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the fifth pixel point in the set of candidate motion information And a codeword length for representing motion information corresponding to the fifth pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the fourth pixel point in the set of candidate motion information; when a pixel point at the upper left corner of the image block to be processed is located in the upper right quarter of the coding tree unit, a codeword length for representing motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the fifth pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the fifth pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the first pixel point in the set of candidate motion information, and the length of the codeword used to represent the motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to the length of the codeword used to represent the motion information corresponding to the fourth pixel point in the set of candidate motion information; when a pixel point at the upper left corner of the image block to be processed is located in the lower left quarter of the coding tree unit, a codeword length for representing motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the fourth pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the fourth pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the second pixel point in the set of candidate motion information, and the length of the codeword used to represent the motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to the length of the codeword used to represent the motion information corresponding to the fifth pixel point in the set of candidate motion information; when a pixel point at the upper left corner of the image block to be processed is located in the lower right quarter of the coding tree unit, a codeword length for representing motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the second pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the fifth pixel point in the set of candidate motion information, and the length of the codeword used to represent the motion information corresponding to the fifth pixel point in the set of candidate motion information is less than or equal to the length of the codeword used to represent the motion information corresponding to the fourth pixel point in the set of candidate motion information.

In an eleventh possible implementation manner of the second aspect, the horizontal direction to the right is a positive direction of a horizontal axis of a rectangular coordinate system, the vertical direction to the down is a positive direction of a vertical axis of the rectangular coordinate system, the (x0, y0) is a coordinate of a pixel point at the top left corner of the tile to be processed, the (x1, y1) is a coordinate of a pixel point at the top left corner of the coding tree unit, and the coordinate of the first pixel point includes: (x1-1, y0+ H0-1), or (x1-1, y0+ H0/2), or (x1-1, y0), wherein H0 is the height of the image block to be processed.

In a twelfth possible implementation manner of the second aspect, the coordinates of the second pixel point include: (x0+ W0-1, y1-1), or (x0+ W0/2, y1-1), or (x0, y1-1), wherein W0 is the width of the image block to be processed.

In a thirteenth possible implementation manner of the second aspect, the coordinates of the third pixel point include: (x0-y0+ y1-1, y1-1) or (x1-1, y0-x0+ x 1-1).

In a fourteenth possible implementation manner of the second aspect, when (x0-x1) > (y0-y1), the coordinates of the third pixel point include (x0-y0+ y1-1, y 1-1); when (x0-x1) < (y0-y1), the coordinates of the third pixel point include (x1-1, y0-x0+ x 1-1).

In a fifteenth possible implementation manner of the second aspect, the coordinates of the fourth pixel point include: (x1-1, y0+ H0+ x0-x1), or (x1-1, y0+ H0+ (x0-x1)/2), or (x1-1, min (y0+ H0+ x0-x1, y1+ H1-1)), wherein H1 is the height of the coding tree unit.

In a sixteenth possible implementation manner of the second aspect, the coordinates of the fifth pixel point include: (x0+ W0+ y0-y1, y1-1), or (x0+ W0+ (y0-y1)/2, y1-1), or (min (x0+ W0+ y0-y1, x1+ W1 x 3/2), y1-1), or (x0+ W0+ y0-y1, x1+ W1-1), y1-1), wherein W1 is the width of the coding tree unit.

In a seventeenth possible implementation manner of the second aspect, the listing module is further configured to: and adding the motion information of the spatial adjacent block and/or the time domain related position block at the preset position of the image block to be processed into the candidate motion information set of the image block to be processed.

In an eighteenth possible implementation manner of the second aspect, the apparatus is configured to decode the to-be-processed image block, and the indexing module is specifically configured to: analyzing the code stream to obtain identification information; and determining the target motion information according to the identification information.

In a nineteenth possible implementation manner of the second aspect, the apparatus is configured to encode the to-be-processed image block, and the indexing module is specifically configured to: and selecting the motion information with the minimum coding cost from the candidate motion information set as the target motion information.

In a twenty-second possible implementation manner of the second aspect, the indexing module is further configured to: and encoding the identification information of the target motion information.

In a third aspect of embodiments of the present application, there is provided a prediction apparatus of motion information, including: a processor and a memory coupled to the processor; the processor is configured to perform the method of the first aspect.

In a fourth aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored therein instructions that, when executed on a computer, cause the computer to perform the method of the first aspect.

In a fifth aspect of embodiments of the present application, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the first aspect described above.

It should be understood that the second to fifth aspects of the present application are consistent with the technical solutions of the first aspect of the present application, and the beneficial effects obtained by the aspects and the corresponding implementable design manners are similar, and are not repeated.

Drawings

FIG. 1 is a schematic block diagram of a video encoding and decoding system in an embodiment of the present application;

FIG. 2 is a schematic block diagram of a video encoder in an embodiment of the present application;

FIG. 3 is a schematic block diagram of a video decoder in an embodiment of the present application;

FIG. 4 is a schematic block diagram of an inter prediction module in an embodiment of the present application;

FIG. 5 is an exemplary flowchart of merging prediction modes in an embodiment of the present application;

FIG. 6 is an exemplary flowchart of an advanced MVP mode in an embodiment of the present application;

fig. 7 is an exemplary flowchart of motion compensation performed by a video decoder in the embodiments of the present application;

FIG. 8 is an exemplary diagram of an encoding unit and an adjacent position image block associated therewith according to an embodiment of the present disclosure;

FIG. 9 is an exemplary flowchart of constructing a candidate predicted motion vector list according to an embodiment of the present application;

fig. 10 is an exemplary diagram illustrating adding a combined candidate motion vector to a merge mode candidate prediction motion vector list according to an embodiment of the present application;

FIG. 11 is an exemplary diagram illustrating the addition of a scaled candidate motion vector to a merge mode candidate prediction motion vector list in an embodiment of the present application;

fig. 12 is an exemplary diagram illustrating adding a zero motion vector to the merge mode candidate prediction motion vector list according to an embodiment of the present application;

FIG. 13 is another exemplary diagram of an encoding unit and an adjacent position image block associated therewith according to an embodiment of the present disclosure;

FIG. 14 is an exemplary diagram of spatial fusion candidates in an embodiment of the present application;

FIG. 15 is an exemplary diagram of non-neighboring spatial fusion candidates in an embodiment of the present application;

FIG. 16 is another exemplary diagram of non-neighboring spatial fusion candidates in an embodiment of the present application;

FIG. 17 is a schematic diagram of another exemplary non-neighboring spatial fusion candidate in an embodiment of the present application;

FIG. 18 is a schematic diagram of another exemplary non-neighboring spatial fusion candidate in an embodiment of the present application;

fig. 19 is an exemplary flowchart of a motion information prediction method according to an embodiment of the present application;

fig. 20 is a block diagram illustrating an exemplary configuration of a motion information prediction apparatus according to an embodiment of the present application;

fig. 21 is a schematic block diagram of a motion information prediction apparatus in the embodiment of the present application;

FIG. 22 is a diagram illustrating a location of a target pixel in an embodiment of the present application;

FIG. 23 is another schematic diagram of the location of a target pixel in the embodiment of the present application;

FIG. 24 is a schematic diagram of region division of a coding tree unit according to an embodiment of the present application;

FIG. 25 is a diagram illustrating another example of the location of a target pixel in the embodiment of the present application;

fig. 26 is a schematic diagram of another region division of a coding tree unit in the embodiment of the present application.

Detailed Description

First, terms used in the embodiments of the present application will be described.

Coding Tree Unit (CTU): an image is made up of a plurality of CTUs, one CTU generally corresponding to a square image area, containing both luminance pixels and chrominance pixels (or alternatively only luminance pixels, or alternatively only chrominance pixels) in the image area; syntax elements are also included in the CTU that indicate how the CTU is divided into at least one Coding Unit (CU), and the method of decoding each coding unit resulting in a reconstructed picture.

CU: a rectangular region corresponding to a x B in the image contains a x B luminance pixels or/and its corresponding chrominance pixels, a being the width of the rectangle, B being the height of the rectangle, a and B being the same or different, a and B typically taking values to the power of 2, e.g. 128, 64, 32, 16, 8, 4. One coding unit comprises a predicted image and a residual image, and the predicted image and the residual image are added to obtain a reconstructed image of the coding unit. The prediction image is generated by intra prediction or inter prediction, and the residual image is generated by inverse quantization and inverse transformation processing of the transformation coefficient.

Video decoding (video decoding): and restoring the video code stream into a reconstructed image according to a specific grammar rule and a specific processing method.

Video encoding (video encoding): compressing the image sequence into code stream;

video coding (video coding): the video encoding and decoding are commonly called, and the Chinese translation name and the video encoding are the same.

VTM: the jfet organization develops new codec reference software.

Fusion candidate (merge candidate or merging candidate): a motion information data structure includes a plurality of kinds of information such as inter-frame prediction direction, reference frame, motion vector, and the like. The current block may select a corresponding fusion candidate from a fusion candidate list (merge candidate) according to a fusion index (merge index), and use motion information of the fusion candidate as motion information of the current block, or use the motion information of the fusion candidate as motion information of the current block after scaling the motion information of the fusion candidate.

The related technology of the embodiment of the application is introduced:

video coding standards such as H.265/HEVC and H.266/VVC divide a frame of image into CTUs which do not overlap with each other. The size of the CTU is set to 64 × 64 or 128 × 128, for example. A 64 x 64 CTU comprises 64 columns of pixels, each column comprising 64 pixels, each pixel comprising a luminance component or/and a chrominance component. One CTU is divided into one or more coding units CU.

One CU contains coding information including prediction mode, transform coefficients, etc. The CU is subjected to decoding processing such as prediction, inverse quantization, and inverse transformation in accordance with the coding information, and a reconstructed image corresponding to the CU is generated. And one CU corresponds to a predicted image and a residual image, and the predicted image and the residual image are added to obtain a reconstructed image. The prediction image is generated by intra prediction or inter prediction, and the residual image is generated by inverse quantization and inverse transformation processing of the transformation coefficient. A prediction image of a CU is composed of one or more Prediction Blocks (PB), and a residual image of the CU is composed of one or more Transform Blocks (TB).

An image block that is being encoded/decoded is referred to as a Current Block (CB), and for example, when an image block is being predicted, the current block is a prediction block; when an image block is being residual processed, the current block is a transform block. The image in which the current block is located is called the current frame. In the current frame, image blocks located on the left or upper side of the current block may be inside the current frame and have completed encoding/decoding processing, resulting in reconstructed images, which are referred to as reconstructed blocks; information of the coding mode, reconstructed pixels, etc. of the reconstructed block is available (available). A frame in which the encoding/decoding process has been completed before the encoding/decoding of the current frame is referred to as a reconstructed frame. When the current frame is a uni-directionally predicted frame (P frame) or a bi-directionally predicted frame (B frame), it has one or two reference frame lists, respectively, referred to as L0 and L1, each of which contains at least one reconstructed frame, referred to as the reference frame of the current frame. The reference frame provides reference pixels for inter-frame prediction of the current frame.

Inter-frame prediction is a prediction technique based on motion compensation, and mainly processes the motion information of a current block, acquires a reference image block from a reference frame of the current block according to the motion information, and generates a prediction image of the current block. The motion information includes inter prediction direction indicating which prediction direction the current block uses among forward prediction, backward prediction, or bi-directional prediction, reference frame, motion vector indicating a displacement vector of a reference image block in the reference frame used for predicting the current block with respect to the current block, and so on, and thus one motion vector corresponds to one reference frame. Inter prediction of an image block can generate a predicted image using pixels in a reference frame by only one motion vector, which is called unidirectional prediction; a prediction image can also be generated by two motion vectors using a combination of pixels in two reference frames, called bi-prediction. That is, an image block may typically contain one or two motion vectors. For some multi-hypothesis inter prediction (multi-prediction) techniques, an image block may contain more than two motion vectors.

The inter prediction indicates a reference frame (reference frame) by a reference frame index (ref _ idx), and indicates a position offset of a reference block (referenceblock) of the current block in the reference frame relative to the current block in the current frame by a Motion Vector (MV). One MV is a two-dimensional vector containing a horizontal direction displacement component and a vertical direction displacement component; one MV corresponds to two frames, each having a Picture Order Count (POC) indicating the number of pictures in display order, so one MV also corresponds to one POC difference value. The POC difference is linear with time interval. Scaling of motion vectors typically uses POC difference based scaling to convert a motion vector between one pair of pictures to a motion vector between another pair of pictures.

The following two common inter prediction modes are used.

1) Advanced Motion Vector Prediction (AMVP) mode: identifying inter-frame prediction direction (forward, backward or bi-directional), reference frame index (reference index), motion vector predictor index (MVPindex), and motion vector residual value (MVD) used by the current block in the code stream; the reference frame queue used is determined by the inter-frame prediction direction, the reference frame pointed by the current block MV is determined by the reference frame index, one MVP in the MVP list is indicated by the motion vector predictor index to be used as the predictor of the current block MV, and one MVP and one MVD are added to obtain one MV.

2) Merge/skip (merge/skip) mode: identifying a merge index (merge index) in the bitstream, selecting a merge candidate (merge candidate) from a merge candidate list (merge candidate list) according to the merge index (merge index), wherein the motion vector information (including prediction direction, reference frame, motion vector) of the current block is determined by the merge candidate (merge candidate). The main difference between the merge mode and the skip mode is that the merge mode implies that the current block has residual information, and the skip mode implies that the current block has no residual information (or the residual is 0); the two modes derive motion information in the same way.

In the HEVC standard, a fusion candidate may be motion information of an image block adjacent to a current block, referred to as a spatial fusion candidate (spatial fusion candidate); or motion information of the image block at the corresponding position of the current block in another coded image, called temporal fusion candidate (temporal fusion candidate). Further, the fusion candidate may be a bi-predictive fusion candidate (bi-predictive fusion candidate) in which forward motion information of one fusion candidate and backward motion information of another fusion candidate are combined, or a zero motion vector fusion candidate (zero motion vector fusion candidate) in which a motion vector is forced to be a 0 vector.

Intra-frame prediction: and generating a prediction image of the current block according to the spatial adjacent pixels of the current block. An intra prediction mode corresponds to a method of generating a prediction image.

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

FIG. 1 is a block diagram of a video coding system 1 of one example described in an embodiment of the present application. As used herein, the term "video coder" generally refers to both video encoders and video decoders. In this application, the term "video coding" or "coding" may generally refer to video encoding or video decoding. The video encoder 100 and the video decoder 200 of the video coding system 1 are configured to predict motion information, such as motion vectors, of a currently coded image block or a sub-block thereof according to various method examples described in any one of a plurality of new inter prediction modes proposed in the present application, such that the predicted motion vectors are maximally close to the motion vectors obtained using a motion estimation method, thereby eliminating the need to transmit motion vector differences when encoding, and further improving the coding and decoding performance.

As shown in fig. 1, video coding system 1 includes a source device 10 and a destination device 20. Source device 10 generates encoded video data. Accordingly, source device 10 may be referred to as a video encoding device. Destination device 20 may decode the encoded video data generated by source device 10. Destination device 20 may therefore be referred to as a video decoding device. Various implementations of source device 10, destination device 20, or both may include one or more processors and memory coupled to the one or more processors. The memory can include, but is not limited to, RAM, ROM, EEPROM, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures that can be accessed by a computer, as described herein.

Source device 10 and destination device 20 may comprise a variety of devices, including desktop computers, mobile computing devices, notebook (e.g., laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called "smart" phones, televisions, cameras, display devices, digital media players, video game consoles, in-vehicle computers, or the like.

Destination device 20 may receive encoded video data from source device 10 over link 30. Link 30 may comprise one or more media or devices capable of moving encoded video data from source device 10 to destination device 20. In one example, link 30 may comprise one or more communication media that enable source device 10 to transmit encoded video data directly to destination device 20 in real-time. In this example, source device 10 may modulate the encoded video data according to a communication standard, such as a wireless communication protocol, and may transmit the modulated video data to destination device 20. The one or more communication media may include wireless and/or wired communication media such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (e.g., the internet). The one or more communication media may include a router, switch, base station, or other apparatus that facilitates communication from source device 10 to destination device 20.

In another example, encoded data may be output from output interface 140 to storage device 40. Similarly, encoded data may be accessed from storage device 40 through input interface 240. Storage device 40 may include any of a variety of distributed or locally accessed data storage media such as a hard drive, blu-ray discs, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory, or any other suitable digital storage media for storing encoded video data.

In another example, storage device 40 may correspond to a file server or another intermediate storage device that may hold the encoded video generated by source device 10. Destination device 20 may access the stored video data from storage device 40 via streaming or download. The file server may be any type of server capable of storing encoded video data and transmitting the encoded video data to destination device 20. Example file servers include web servers (e.g., for a website), FTP servers, Network Attached Storage (NAS) devices, or local disk drives. Destination device 20 may access the encoded video data over any standard data connection, including an internet connection. This may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both suitable for accessing encoded video data stored on a file server. The transmission of the encoded video data from storage device 40 may be a streaming transmission, a download transmission, or a combination of both.

The motion vector prediction techniques of the present application may be applied to video codecs to support a variety of multimedia applications, such as over-the-air television broadcasts, cable television transmissions, satellite television transmissions, streaming video transmissions (e.g., via the internet), encoding for video data stored on a data storage medium, decoding of video data stored on a data storage medium, or other applications. In some examples, video coding system 1 may be used to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.

The video coding system 1 illustrated in fig. 1 is merely an example, and the techniques of this application may be applied to video coding settings (e.g., video encoding or video decoding) that do not necessarily include any data communication between an encoding device and a decoding device. In other examples, the data is retrieved from local storage, streamed over a network, and so forth. A video encoding device may encode and store data to a memory, and/or a video decoding device may retrieve and decode data from a memory. In many examples, the encoding and decoding are performed by devices that do not communicate with each other, but merely encode data to and/or retrieve data from memory and decode data.

In the example of fig. 1, source device 10 includes video source 120, video encoder 100, and output interface 140. In some examples, output interface 140 may include a regulator/demodulator (modem) and/or a transmitter. Video source 120 may comprise a video capture device (e.g., a video camera), a video archive containing previously captured video data, a video feed interface to receive video data from a video content provider, and/or a computer graphics system for generating video data, or a combination of such sources of video data.

Video encoder 100 may encode video data from video source 120. In some examples, source device 10 transmits the encoded video data directly to destination device 20 via output interface 140. In other examples, encoded video data may also be stored onto storage device 40 for later access by destination device 20 for decoding and/or playback.

In the example of fig. 1, destination device 20 includes input interface 240, video decoder 200, and display device 220. In some examples, input interface 240 includes a receiver and/or a modem. Input interface 240 may receive encoded video data via link 30 and/or from storage device 40. Display device 220 may be integrated with destination device 20 or may be external to destination device 20. In general, display device 220 displays decoded video data. The display device 220 may include a variety of display devices, such as a Liquid Crystal Display (LCD), a plasma display, an Organic Light Emitting Diode (OLED) display, or other types of display devices.

Although not shown in fig. 1, in some aspects, video encoder 100 and video decoder 200 may each be integrated with an audio encoder and decoder, and may include appropriate multiplexer-demultiplexer units or other hardware and software to handle encoding of both audio and video in a common data stream or separate data streams. In some examples, the MUX-DEMUX unit may conform to the ITU h.223 multiplexer protocol, or other protocols such as the User Datagram Protocol (UDP), if applicable.

Video encoder 100 and video decoder 200 may each be implemented as any of a variety of circuits such as: one or more microprocessors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), discrete logic, hardware, or any combinations thereof. If the present application is implemented in part in software, a device may store instructions for the software in a suitable non-volatile computer-readable storage medium and may execute the instructions in hardware using one or more processors to implement the techniques of the present application. Any of the foregoing, including hardware, software, a combination of hardware and software, etc., may be considered one or more processors. Each of video encoder 100 and video decoder 200 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (codec) in a respective device.

This application may generally refer to video encoder 100 as "signaling" or "transmitting" certain information to another device, such as video decoder 200. The terms "signaling" or "transmitting" may generally refer to the communication of syntax elements and/or other data used to decode compressed video data. This transfer may occur in real time or near real time. Alternatively, such communication may occur over a period of time, such as may occur when, at the time of encoding, syntax elements are stored in the encoded codestream to a computer-readable storage medium, which the decoding device may then retrieve at any time after the syntax elements are stored to such medium.

The H.265(HEVC) standard was developed by JCT-VC. HEVC standardization is based on an evolution model of a video decoding device called the HEVC test model (HM). The latest standard document for H.265 is available from http:// www.itu.int/REC/T-REC-H.265, the latest version of the standard document being H.265(12/16), which is incorporated herein by reference in its entirety. The HM assumes that the video decoding device has several additional capabilities with respect to existing algorithms of ITU-T H.264/AVC. For example, h.264 provides 9 intra-prediction encoding modes, while the HM may provide up to 35 intra-prediction encoding modes.

Jfet is dedicated to developing the h.266 standard. The process of h.266 normalization is based on an evolving model of the video decoding apparatus called the h.266 test model. The algorithm description of H.266 is available from http:// phenix. int-evry. fr/JVET, with the latest algorithm description contained in JFET-F1001-v 2, which is incorporated herein by reference in its entirety. Also, reference software for JEM test models is available from https:// jvet. hhi. fraunhofer. de/svn/svn _ HMJEMSOFORWare/incorporated herein by reference in its entirety.

In general, the working model description for HM may divide a video frame or image into a sequence of treeblocks or Largest Coding Units (LCUs), also referred to as CTUs, that include both luma and chroma samples. Treeblocks have a similar purpose as macroblocks of the h.264 standard. A slice includes a number of consecutive treeblocks in decoding order. A video frame or image may be partitioned into one or more slices. Each treeblock may be split into coding units according to a quadtree. For example, a treeblock that is the root node of a quadtree may be split into four child nodes, and each child node may in turn be a parent node and split into four other child nodes. The final non-fragmentable child node, which is a leaf node of the quadtree, comprises a decoding node, e.g., a decoded video block. Syntax data associated with the decoded codestream may define a maximum number of times the treeblock may be split, and may also define a minimum size of the decoding node.

An encoding unit includes a decoding node and a prediction block (PU) and a Transform Unit (TU) associated with the decoding node. The size of a CU corresponds to the size of the decoding node and must be square in shape. The size of a CU may range from 8 x 8 pixels up to a maximum treeblock size of 64 x 64 pixels or more. Each CU may contain one or more PUs and one or more TUs. For example, syntax data associated with a CU may describe a situation in which the CU is partitioned into one or more PUs. The partition mode may be different between cases where a CU is skipped or is directly mode encoded, intra prediction mode encoded, or inter prediction mode encoded. The PU may be partitioned into shapes other than square. For example, syntax data associated with a CU may also describe a situation in which the CU is partitioned into one or more TUs according to a quadtree. The TU may be square or non-square in shape.

The HEVC standard allows for transform according to TUs, which may be different for different CUs. A TU is typically sized based on the size of a PU within a given CU defined for a partitioned LCU, although this may not always be the case. The size of a TU is typically the same as or smaller than a PU. In some possible implementations, the residual samples corresponding to a CU may be subdivided into smaller units using a quadtree structure called a "residual qualtree" (RQT). The leaf nodes of the RQT may be referred to as TUs. The pixel difference values associated with the TUs may be transformed to produce transform coefficients, which may be quantized.

In general, a PU includes data related to a prediction process. For example, when the PU is intra-mode encoded, the PU may include data describing an intra-prediction mode of the PU. As another possible implementation, when the PU is inter-mode encoded, the PU may include data defining a motion vector for the PU. For example, the data defining the motion vector for the PU may describe a horizontal component of the motion vector, a vertical component of the motion vector, a resolution of the motion vector (e.g., one-quarter pixel precision or one-eighth pixel precision), a reference picture to which the motion vector points, and/or a reference picture list of the motion vector (e.g., list 0, list 1, or list C).

In general, TUs use a transform and quantization process. A given CU with one or more PUs may also contain one or more TUs. After prediction, video encoder 100 may calculate residual values corresponding to the PU. The residual values comprise pixel difference values that may be transformed into transform coefficients, quantized, and scanned using TUs to produce serialized transform coefficients for entropy decoding. The term "video block" is generally used herein to refer to a decoding node of a CU. In some particular applications, the present application may also use the term "video block" to refer to a treeblock that includes a decoding node as well as PUs and TUs, e.g., an LCU or CU.

A video sequence typically comprises a series of video frames or images. A group of pictures (GOP) illustratively comprises a series of one or more video pictures. The GOP may include syntax data in header information of the GOP, header information of one or more of the pictures, or elsewhere, the syntax data describing the number of pictures included in the GOP. Each slice of a picture may include slice syntax data that describes the coding mode of the respective picture. Video encoder 100 typically operates on video blocks within individual video stripes in order to encode the video data. The video block may correspond to a decoding node within the CU. Video blocks may have fixed or varying sizes and may differ in size according to a specified decoding standard.

As a possible implementation, the HM supports prediction of various PU sizes. Assuming that the size of a particular CU is 2N × 2N, the HM supports intra prediction of PU sizes of 2N × 2N or N × N, and inter prediction of symmetric PU sizes of 2N × 2N, 2N × N, N × 2N, or N × N. The HM also supports asymmetric partitioning for inter prediction for PU sizes of 2 nxnu, 2 nxnd, nlx 2N and nR x 2N. In asymmetric partitioning, one direction of a CU is not partitioned, while the other direction is partitioned into 25% and 75%. The portion of the CU corresponding to the 25% section is indicated by an indication of "n" followed by "Up", "Down", "Left", or "Right". Thus, for example, "2N × nU" refers to a horizontally split 2N × 2NCU, with 2N × 0.5NPU on top and 2N × 1.5NPU on the bottom.

In this application, "N × N" and "N by N" are used interchangeably to refer to the pixel size of a video block in both the vertical and horizontal dimensions, e.g., 16 × 16 pixels or 16 by 16 pixels. In general, a 16 × 16 block will have 16 pixels in the vertical direction (y ═ 16) and 16 pixels in the horizontal direction (x ═ 16). Likewise, an nxn block generally has N pixels in the vertical direction and N pixels in the horizontal direction, where N represents a non-negative integer value. The pixels in a block may be arranged in rows and columns. Furthermore, the block does not necessarily need to have the same number of pixels in the horizontal direction as in the vertical direction. For example, a block may comprise N × M pixels, where M is not necessarily equal to N.

After using intra-predictive or inter-predictive decoding of PUs of the CU, video encoder 100 may calculate residual data for TUs of the CU. A PU may comprise pixel data in a spatial domain (also referred to as a pixel domain), and a TU may comprise coefficients in a transform domain after applying a transform (e.g., a Discrete Cosine Transform (DCT), an integer transform, a wavelet transform, or a conceptually similar transform) to residual video data. The residual data may correspond to pixel differences between pixels of the unencoded picture and prediction values corresponding to the PUs. Video encoder 100 may form TUs that include residual data of a CU, and then transform the TUs to generate transform coefficients for the CU.

After any transform to generate transform coefficients, video encoder 100 may perform quantization of the transform coefficients. Quantization exemplarily refers to a process of quantizing coefficients to possibly reduce the amount of data used to represent the coefficients, thereby providing further compression. The quantization process may reduce the bit depth associated with some or all of the coefficients. For example, an n-bit value may be reduced to an m-bit value during quantization, where n is greater than m.

The JEM model further improves the coding structure of video images, and in particular, a block coding structure called "quadtree combined binary tree" (QTBT) is introduced. The QTBT structure abandons the concepts of CU, PU, TU and the like in HEVC, supports more flexible CU partition shapes, and one CU can be square or rectangular. One CTU first performs a quadtree division, and leaf nodes of the quadtree are further subjected to a binary tree division. Meanwhile, there are two partitioning modes in binary tree partitioning, symmetric horizontal partitioning and symmetric vertical partitioning. The leaf nodes of the binary tree are called CUs, and none of the JEM CUs can be further divided during prediction and transformation, i.e. all of the JEM CUs, PUs, TUs have the same block size. In JEM at the present stage, the maximum size of the CTU is 256 × 256 luminance pixels.

In some possible implementations, video encoder 100 may utilize a predefined scan order to scan the quantized transform coefficients to generate a serialized vector that may be entropy encoded. In other possible implementations, video encoder 100 may perform adaptive scanning. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 100 may entropy decode the one-dimensional vector according to context adaptive variable length decoding (CAVLC), context adaptive binary arithmetic decoding (CABAC), syntax-based context adaptive binary arithmetic decoding (SBAC), Probability Interval Partition Entropy (PIPE) decoding, or other entropy decoding methods. Video encoder 100 may also entropy encode syntax elements associated with the encoded video data for use by video decoder 200 in decoding the video data.

To perform CABAC, video encoder 100 may assign a context within the context model to a symbol to be transmitted. A context may relate to whether adjacent values of a symbol are non-zero. To perform CAVLC, video encoder 100 may select a variable length code of a symbol to be transmitted. Codewords in variable length decoding (VLC) may be constructed such that relatively shorter codes correspond to more likely symbols, while longer codes correspond to less likely symbols. In this way, the use of VLC may achieve a code rate saving goal with respect to using equal length codewords for each symbol to be transmitted. The probability in CABAC may be determined based on the context assigned to the symbol.

In embodiments of the present application, a video encoder may perform inter prediction to reduce temporal redundancy between pictures. As described previously, a CU may have one or more prediction units, PUs, according to the specifications of different video compression codec standards. In other words, multiple PUs may belong to a CU, or the PUs and the CU are the same size. When the CU and PU sizes are the same, the partition mode of the CU is not partitioned, or is partitioned into one PU, and is expressed by using the PU collectively herein. When the video encoder performs inter prediction, the video encoder may signal the video decoder with motion information for the PU. For example, the motion information of the PU may include: reference picture index, motion vector and prediction direction identification. The motion vector may indicate a displacement between an image block (also referred to as a video block, a block of pixels, a set of pixels, etc.) of the PU and a reference block of the PU. The reference block of the PU may be a portion of a reference picture that is similar to the image block of the PU. The reference block may be located in a reference picture indicated by the reference picture index and the prediction direction identification.

To reduce the number of coding bits needed to represent the motion information of the PU, the video encoder may generate a list of candidate prediction motion vectors for each of the PUs according to a merge prediction mode or advanced motion vector prediction mode process. Each candidate predictive motion vector in the list of candidate predictive motion vectors for the PU may indicate motion information. The motion information indicated by some candidate predicted motion vectors in the candidate predicted motion vector list may be based on motion information of other PUs. The present application may refer to a candidate predicted motion vector as an "original" candidate predicted motion vector if the candidate predicted motion vector indicates motion information that specifies one of a spatial candidate predicted motion vector position or a temporal candidate predicted motion vector position. For example, for merge mode, also referred to herein as merge prediction mode, there may be five original spatial candidate predicted motion vector positions and one original temporal candidate predicted motion vector position. In some examples, the video encoder may generate additional candidate predicted motion vectors by combining partial motion vectors from different original candidate predicted motion vectors, modifying the original candidate predicted motion vectors, or inserting only zero motion vectors as candidate predicted motion vectors. These additional candidate predicted motion vectors are not considered as original candidate predicted motion vectors and may be referred to as artificially generated candidate predicted motion vectors in this application.

The techniques of this application generally relate to techniques for generating a list of candidate predictive motion vectors at a video encoder and techniques for generating the same list of candidate predictive motion vectors at a video decoder. The video encoder and the video decoder may generate the same candidate prediction motion vector list by implementing the same techniques for constructing the candidate prediction motion vector list. For example, both the video encoder and the video decoder may construct a list with the same number of candidate predicted motion vectors (e.g., five candidate predicted motion vectors). Video encoders and decoders may first consider spatial candidate predictive motion vectors (e.g., neighboring blocks in the same picture), then temporal candidate predictive motion vectors (e.g., candidate predictive motion vectors in different pictures), and finally may consider artificially generated candidate predictive motion vectors until a desired number of candidate predictive motion vectors are added to the list. According to the techniques of this application, a pruning operation may be utilized during candidate predicted motion vector list construction for certain types of candidate predicted motion vectors in order to remove duplicates from the candidate predicted motion vector list, while for other types of candidate predicted motion vectors, pruning may not be used in order to reduce decoder complexity. For example, for a set of spatial candidate predicted motion vectors and for temporal candidate predicted motion vectors, a pruning operation may be performed to exclude candidate predicted motion vectors with duplicate motion information from the list of candidate predicted motion vectors. However, when the artificially generated candidate predicted motion vector is added to the list of candidate predicted motion vectors, the artificially generated candidate predicted motion vector may be added without performing a clipping operation on the artificially generated candidate predicted motion vector.

After generating the candidate predictive motion vector list for the PU of the CU, the video encoder may select a candidate predictive motion vector from the candidate predictive motion vector list and output a candidate predictive motion vector index in the codestream. The selected candidate predictive motion vector may be the candidate predictive motion vector having a motion vector that yields the predictor that most closely matches the target PU being decoded. The candidate predicted motion vector index may indicate a position in the candidate predicted motion vector list where the candidate predicted motion vector is selected. The video encoder may also generate a predictive image block for the PU based on the reference block indicated by the motion information of the PU. The motion information for the PU may be determined based on the motion information indicated by the selected candidate predictive motion vector. For example, in merge mode, the motion information of the PU may be the same as the motion information indicated by the selected candidate prediction motion vector. In the AMVP mode, the motion information of the PU may be determined based on the motion vector difference of the PU and the motion information indicated by the selected candidate prediction motion vector. The video encoder may generate one or more residual tiles for the CU based on the predictive tiles of the PUs of the CU and the original tiles for the CU. The video encoder may then encode the one or more residual image blocks and output the one or more residual image blocks in the code stream.

The codestream may include data identifying a selected candidate predictive motion vector in a candidate predictive motion vector list for the PU. The video decoder may determine the motion information for the PU based on the motion information indicated by the selected candidate predictive motion vector in the candidate predictive motion vector list for the PU. The video decoder may identify one or more reference blocks for the PU based on the motion information of the PU. After identifying the one or more reference blocks of the PU, the video decoder may generate a predictive image block for the PU based on the one or more reference blocks of the PU. The video decoder may reconstruct the tiles for the CU based on the predictive tiles for the PUs of the CU and the one or more residual tiles for the CU.

For ease of explanation, this application may describe locations or image blocks as having various spatial relationships with CUs or PUs. This description may be interpreted to mean that the locations or tiles have various spatial relationships with the tiles associated with the CU or PU. Furthermore, the present application may refer to a PU that is currently being decoded by the video decoder as a current PU, also referred to as a current pending image block. This application may refer to a CU that a video decoder is currently decoding as the current CU. The present application may refer to a picture that is currently being decoded by a video decoder as a current picture. It should be understood that the present application is applicable to the case where the PU and the CU have the same size, or the PU is the CU, and the PU is used uniformly for representation.

As briefly described above, video encoder 100 may use inter prediction to generate predictive image blocks and motion information for PUs of a CU. In many instances, the motion information for a given PU may be the same as or similar to the motion information of one or more nearby PUs (i.e., PUs whose tiles are spatially or temporally nearby to the tiles of the given PU). Because nearby PUs often have similar motion information, video encoder 100 may encode the motion information of a given PU with reference to the motion information of nearby PUs. Encoding motion information for a given PU with reference to motion information for nearby PUs may reduce the number of encoding bits required in the codestream to indicate the motion information for the given PU.

Video encoder 100 may encode the motion information of a given PU with reference to the motion information of nearby PUs in various ways. For example, video encoder 100 may indicate that the motion information of a given PU is the same as the motion information of nearby PUs. This application may use merge mode to refer to indicating that the motion information of a given PU is the same as or derivable from the motion information of nearby PUs. In another possible implementation, video encoder 100 may calculate a Motion Vector Difference (MVD) for a given PU. The MVD indicates the difference between the motion vector of a given PU and the motion vectors of nearby PUs. Video encoder 100 may include the motion vector for the MVD in the motion information for the given PU instead of the given PU. Fewer coding bits are required to represent the MVD in the codestream than to represent the motion vector for a given PU. The present application may use advanced motion vector prediction mode to refer to signaling motion information of a given PU to a decoding end by using an MVD and an index value identifying a candidate motion vector.

To signal motion information for a given PU at a decoding end using merge mode or AMVP mode, video encoder 100 may generate a list of candidate predictive motion vectors for the given PU. The candidate predictive motion vector list may include one or more candidate predictive motion vectors. Each of the candidate predictive motion vectors in the candidate predictive motion vector list for a given PU may specify motion information. The motion information indicated by each candidate predicted motion vector may include a motion vector, a reference picture index, and a prediction direction identification. The candidate predicted motion vectors in the candidate predicted motion vector list may comprise "original" candidate predicted motion vectors, where each indicates motion information for one of the specified candidate predicted motion vector positions within a PU that is different from the given PU.

After generating the list of candidate predictive motion vectors for the PU, video encoder 100 may select one of the candidate predictive motion vectors from the list of candidate predictive motion vectors for the PU. For example, the video encoder may compare each candidate predictive motion vector to the PU being decoded and may select a candidate predictive motion vector with the desired rate-distortion cost. Video encoder 100 may output the candidate prediction motion vector index for the PU. The candidate predicted motion vector index may identify a position of the selected candidate predicted motion vector in the candidate predicted motion vector list.

Furthermore, video encoder 100 may generate the predictive picture block for the PU based on the reference block indicated by the motion information of the PU. The motion information for the PU may be determined based on motion information indicated by a selected candidate predictive motion vector in a list of candidate predictive motion vectors for the PU. For example, in merge mode, the motion information of the PU may be the same as the motion information indicated by the selected candidate prediction motion vector. In AMVP mode, the motion information for the PU may be determined based on the motion vector difference for the PU and the motion information indicated by the selected candidate prediction motion vector. Video encoder 100 may process the predictive image blocks for the PU as described previously.

When video decoder 200 receives the codestream, video decoder 200 may generate a list of candidate predicted motion vectors for each of the PUs of the CU. The candidate prediction motion vector list generated by video decoder 200 for the PU may be the same as the candidate prediction motion vector list generated by video encoder 100 for the PU. The syntax element parsed from the codestream may indicate a location in the candidate predicted motion vector list for the PU where the candidate predicted motion vector is selected. After generating the list of candidate prediction motion vectors for the PU, video decoder 200 may generate a predictive image block for the PU based on one or more reference blocks indicated by the motion information of the PU. Video decoder 200 may determine the motion information for the PU based on the motion information indicated by the selected candidate predictive motion vector in the list of candidate predictive motion vectors for the PU. Video decoder 200 may reconstruct the tiles for the CU based on the predictive tiles for the PU and the residual tiles for the CU.

It should be understood that, in a possible implementation manner, at the decoding end, the construction of the candidate predicted motion vector list and the parsing of the selected candidate predicted motion vector from the code stream in the candidate predicted motion vector list are independent of each other, and may be performed in any order or in parallel.

In another possible implementation manner, at a decoding end, the position of a candidate predicted motion vector in a candidate predicted motion vector list is firstly analyzed and selected from a code stream, and the candidate predicted motion vector list is constructed according to the analyzed position. For example, when the selected candidate predicted motion vector obtained by analyzing the code stream is a candidate predicted motion vector with an index of 3 in the candidate predicted motion vector list, the candidate predicted motion vector with the index of 3 can be determined only by constructing the candidate predicted motion vector list from the index of 0 to the index of 3, so that the technical effects of reducing complexity and improving decoding efficiency can be achieved.

Fig. 2 is a block diagram of a video encoder 100 of one example described in an embodiment of the present application. The video encoder 100 is used to output video to the post-processing entity 41. Post-processing entity 41 represents an example of a video entity, such as a media-aware network element (MANE) or a splicing/editing device, that may process the encoded video data from video encoder 100. In some cases, post-processing entity 41 may be an instance of a network entity. In some video encoding systems, post-processing entity 41 and video encoder 100 may be parts of separate devices, while in other cases, the functionality described with respect to post-processing entity 41 may be performed by the same device that includes video encoder 100. In some example, post-processing entity 41 is an example of storage 40 of FIG. 1.

In the example of fig. 2, the video encoder 100 includes a prediction processing unit 108, a filter unit 106, a Decoded Picture Buffer (DPB)107, a summer 112, a transformer 101, a quantizer 102, and an entropy encoder 103. The prediction processing unit 108 includes an inter predictor 110 and an intra predictor 109. For image block reconstruction, the video encoder 100 further includes an inverse quantizer 104, an inverse transformer 105, and a summer 111. Filter unit 106 is intended to represent one or more loop filters, such as deblocking filters, Adaptive Loop Filters (ALF), and Sample Adaptive Offset (SAO) filters. Although filter unit 106 is shown in fig. 2 as an in-loop filter, in other implementations, filter unit 106 may be implemented as a post-loop filter. In one example, the video encoder 100 may further include a video data memory, a partitioning unit (not shown).

The video data memory may store video data to be encoded by components of video encoder 100. The video data stored in the video data memory may be obtained from video source 120. DPB 107 may be a reference picture memory that stores reference video data used to encode video data by video encoder 100 in intra, inter coding modes. The video data memory and DPB 107 may be formed from any of a variety of memory devices, such as Dynamic Random Access Memory (DRAM) including synchronous DRAM (sdram), magnetoresistive ram (mram), resistive ram (rram), or other types of memory devices. The video data memory and DPB 107 may be provided by the same memory device or separate memory devices. In various examples, the video data memory may be on-chip with other components of video encoder 100, or off-chip relative to those components.

As shown in fig. 2, video encoder 100 receives video data and stores the video data in a video data memory. The partitioning unit partitions the video data into image blocks and these image blocks may be further partitioned into smaller blocks, e.g. image block partitions based on a quadtree structure or a binary tree structure. This segmentation may also include segmentation into stripes (slices), slices (tiles), or other larger units. Video encoder 100 generally illustrates components that encode image blocks within a video slice to be encoded. The slice may be divided into a plurality of tiles (and possibly into a set of tiles called a slice). Prediction processing unit 108 may select one of a plurality of possible coding modes for the current image block, such as one of a plurality of intra coding modes or one of a plurality of inter coding modes. Prediction processing unit 108 may provide the resulting intra, inter coded block to summer 112 to generate a residual block and to summer 111 to reconstruct the encoded block used as the reference picture.

An intra predictor 109 within prediction processing unit 108 may perform intra-predictive encoding of the current block relative to one or more neighboring blocks in the same frame or slice as the current block to be encoded to remove spatial redundancy. Inter predictor 110 within prediction processing unit 108 may perform inter-predictive encoding of the current block relative to one or more prediction blocks in one or more reference pictures to remove temporal redundancy.

In particular, the inter predictor 110 may be used to determine an inter prediction mode for encoding a current image block. For example, the inter predictor 110 may use rate-distortion analysis to calculate rate-distortion values for various inter prediction modes in the set of candidate inter prediction modes and select the inter prediction mode having the best rate-distortion characteristics therefrom. Rate distortion analysis typically determines the amount of distortion (or error) between an encoded block and an original, unencoded block that was encoded to produce the encoded block, as well as the bit rate (that is, the number of bits) used to produce the encoded block. For example, the inter predictor 110 may determine an inter prediction mode with the smallest rate-distortion cost for encoding the current image block in the candidate inter prediction mode set as the inter prediction mode for inter predicting the current image block.

The inter predictor 110 is configured to predict motion information (e.g., a motion vector) of one or more sub-blocks in the current image block based on the determined inter prediction mode, and acquire or generate a prediction block of the current image block using the motion information (e.g., the motion vector) of the one or more sub-blocks in the current image block. The inter predictor 110 may locate the prediction block to which the motion vector points in one of the reference picture lists. The inter predictor 110 may also generate syntax elements associated with the image block and the video slice for use by the video decoder 200 in decoding the image block of the video slice. Or, in an example, the inter predictor 110 performs a motion compensation process using the motion information of each sub-block to generate a prediction block of each sub-block, so as to obtain a prediction block of the current image block; it should be understood that the inter predictor 110 herein performs motion estimation and motion compensation processes.

Specifically, after selecting the inter prediction mode for the current image block, the inter predictor 110 may provide information indicating the selected inter prediction mode for the current image block to the entropy encoder 103, so that the entropy encoder 103 encodes the information indicating the selected inter prediction mode.

The intra predictor 109 may perform intra prediction on the current image block. In particular, the intra predictor 109 may determine an intra prediction mode used to encode the current block. For example, the intra predictor 109 may calculate rate-distortion values for various intra prediction modes to be tested using rate-distortion analysis and select an intra prediction mode having the best rate-distortion characteristics from among the modes to be tested. In any case, after selecting the intra prediction mode for the image block, the intra predictor 109 may provide information indicating the selected intra prediction mode for the current image block to the entropy encoder 103 so that the entropy encoder 103 encodes the information indicating the selected intra prediction mode.

After prediction processing unit 108 generates a prediction block for the current image block via inter-prediction, intra-prediction, video encoder 100 forms a residual image block by subtracting the prediction block from the current image block to be encoded. Summer 112 represents one or more components that perform this subtraction operation. The residual video data in the residual block may be included in one or more TUs and applied to transformer 101. The transformer 101 transforms the residual video data into residual transform coefficients using a transform such as a Discrete Cosine Transform (DCT) or a conceptually similar transform. Transformer 101 may convert residual video data from a pixel value domain to a transform domain, e.g., the frequency domain.

The transformer 101 may send the resulting transform coefficients to the quantizer 102. Quantizer 102 quantizes the transform coefficients to further reduce the bit rate. In some examples, quantizer 102 may then perform a scan of a matrix that includes quantized transform coefficients. Alternatively, the entropy encoder 103 may perform a scan.

After quantization, the entropy encoder 103 entropy encodes the quantized transform coefficients. For example, the entropy encoder 103 may perform Context Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), Probability Interval Partition Entropy (PIPE) coding, or another entropy encoding method or technique. After entropy encoding by the entropy encoder 103, the encoded codestream may be transmitted to the video decoder 200, or archived for later transmission or retrieved by the video decoder 200. The entropy encoder 103 may also entropy encode syntax elements of the current image block to be encoded.

Inverse quantizer 104 and inverse transformer 105 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain, e.g., for later use as a reference block for a reference image. The summer 111 adds the reconstructed residual block to the prediction block produced by the inter predictor 110 or the intra predictor 109 to produce a reconstructed image block. The filter unit 106 may be adapted to reconstruct the image block to reduce distortions, such as block artifacts. This reconstructed image block is then stored as a reference block in the decoded image buffer 107, which may be used by the inter predictor 110 as a reference block to inter predict a block in a subsequent video frame or image.

It should be understood that other structural variations of the video encoder 100 may be used to encode the video stream. For example, for some image blocks or image frames, the video encoder 100 may quantize the residual signal directly without processing by the transformer 101 and correspondingly without processing by the inverse transformer 105; alternatively, for some image blocks or image frames, the video encoder 100 does not generate residual data and accordingly does not need to be processed by the transformer 101, quantizer 102, inverse quantizer 104, and inverse transformer 105; alternatively, the video encoder 100 may store the reconstructed picture block directly as a reference block without processing by the filter unit 106; alternatively, the quantizer 102 and the dequantizer 104 in the video encoder 100 may be combined together.

Fig. 3 is a block diagram of a video decoder 200 of one example described in an embodiment of the present application. In the example of fig. 3, the video decoder 200 includes an entropy decoder 203, a prediction processing unit 208, an inverse quantizer 204, an inverse transformer 205, a summer 211, a filter unit 206, and a decoded image buffer 207. The prediction processing unit 208 may include an inter predictor 210 and an intra predictor 209. In some examples, video decoder 200 may perform a decoding process that is substantially reciprocal to the encoding process described with respect to video encoder 100 from fig. 2.

In the decoding process, video decoder 200 receives an encoded video bitstream representing an image block and associated syntax elements of an encoded video slice from video encoder 100. Video decoder 200 may receive video data from network entity 42 and, optionally, may store the video data in a video data store (not shown). The video data memory may store video data, such as an encoded video bitstream, to be decoded by components of video decoder 200. The video data stored in the video data memory may be obtained, for example, from storage device 40, from a local video source such as a camera, via wired or wireless network communication of video data, or by accessing a physical data storage medium. The video data memory may serve as a decoded picture buffer (CPB) for storing encoded video data from the encoded video bitstream. Therefore, although the video data memory is not illustrated in fig. 3, the video data memory and the DPB 207 may be the same memory or may be separately provided memories. Video data memory and DPB 207 may be formed from any of a variety of memory devices, such as: dynamic Random Access Memory (DRAM) including synchronous DRAM (sdram), magnetoresistive ram (mram), resistive ram (rram), or other types of memory devices. In various examples, the video data memory may be integrated on-chip with other components of video decoder 200, or disposed off-chip with respect to those components.

Network entity 42 may be, for example, a server, a MANE, a video editor/splicer, or other such device for implementing one or more of the techniques described above. Network entity 42 may or may not include a video encoder, such as video encoder 100. Network entity 42 may implement portions of the techniques described in this application before network entity 42 sends the encoded video bitstream to video decoder 200. In some video decoding systems, network entity 42 and video decoder 200 may be part of separate devices, while in other cases, the functionality described with respect to network entity 42 may be performed by the same device that includes video decoder 200. In some cases, network entity 42 may be an example of storage 40 of fig. 1.

The entropy decoder 203 of the video decoder 200 entropy decodes the code stream to generate quantized coefficients and some syntax elements. The entropy decoder 203 forwards the syntax elements to the prediction processing unit 208. Video decoder 200 may receive syntax elements at the video slice level and/or the picture block level.

When a video slice is decoded as an intra-decoded (I) slice, intra predictor 209 of prediction processing unit 208 may generate a prediction block for an image block of the current video slice based on the signaled intra prediction mode and data from previously decoded blocks of the current frame or picture. When a video slice is decoded as an inter-decoded (i.e., B or P) slice, the inter predictor 210 of the prediction processing unit 208 may determine an inter prediction mode for decoding a current image block of the current video slice based on syntax elements received from the entropy decoder 203, decode the current image block (e.g., perform inter prediction) based on the determined inter prediction mode. Specifically, the inter predictor 210 may determine whether a current image block of the current video slice is predicted using a new inter prediction mode, and if the syntax element indicates that the current image block is predicted using the new inter prediction mode, predict motion information of the current image block or a sub-block of the current image block of the current video slice based on the new inter prediction mode (e.g., a new inter prediction mode designated by the syntax element or a default new inter prediction mode), so as to obtain or generate a prediction block of the current image block or the sub-block of the current image block using the motion information of the predicted current image block or the sub-block of the current image block through a motion compensation process. The motion information herein may include reference picture information and motion vectors, wherein the reference picture information may include, but is not limited to, uni/bi-directional prediction information, reference picture list numbers and reference picture indexes corresponding to the reference picture lists. For inter-prediction, a prediction block may be generated from one of the reference pictures within one of the reference picture lists. Video decoder 200 may construct reference picture lists, i.e., list 0 and list 1, based on the reference pictures stored in DPB 207. The reference frame index for the current picture may be included in one or more of reference frame list 0 and list 1. In some examples, it may be the particular syntax element that video encoder 100 signals indicating whether a new inter prediction mode is employed to decode the particular block, or it may be the particular syntax element that signals indicating whether a new inter prediction mode is employed and which new inter prediction mode is specifically employed to decode the particular block. It should be understood that the inter predictor 210 herein performs a motion compensation process.

The inverse quantizer 204 inversely quantizes, i.e., dequantizes, the quantized transform coefficients provided in the codestream and decoded by the entropy decoder 203. The inverse quantization process may include: the quantization parameter calculated by the video encoder 100 for each image block in the video slice is used to determine the degree of quantization that should be applied and likewise the degree of inverse quantization that should be applied. Inverse transformer 205 applies an inverse transform, such as an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process, to the transform coefficients in order to generate a block of residues in the pixel domain.

After the inter predictor 210 generates a prediction block for the current image block or a sub-block of the current image block, the video decoder 200 obtains a reconstructed block, i.e., a decoded image block, by summing the residual block from the inverse transformer 205 with the corresponding prediction block generated by the inter predictor 210. Summer 211 represents the component that performs this summation operation. A loop filter (in or after the decoding loop) may also be used to smooth pixel transitions or otherwise improve video quality, if desired. Filter unit 206 may represent one or more loop filters, such as deblocking filters, Adaptive Loop Filters (ALF), and Sample Adaptive Offset (SAO) filters. Although the filter unit 206 is shown in fig. 3 as an in-loop filter, in other implementations, the filter unit 206 may be implemented as a post-loop filter. In one example, the filter unit 206 is adapted to reconstruct the blocks to reduce the block distortion, and the result is output as a decoded video stream. Also, decoded image blocks in a given frame or picture may also be stored in a decoded picture buffer 207, the decoded picture buffer 207 storing reference pictures for subsequent motion compensation. Decoded image buffer 207 may be part of a memory, which may also store decoded video for later presentation on a display device (e.g., display device 220 of fig. 1), or may be separate from such memory.

It should be understood that other structural variations of the video decoder 200 may be used to decode the encoded video stream. For example, the video decoder 200 may generate an output video stream without processing by the filter unit 206; alternatively, for some image blocks or image frames, the entropy decoder 203 of the video decoder 200 does not decode quantized coefficients and accordingly does not need to be processed by the inverse quantizer 204 and the inverse transformer 205.

As noted previously, the techniques of this application illustratively relate to inter-frame decoding. It should be understood that the techniques of this application may be performed by any of the video decoders described in this application, including, for example, video encoder 100 and video decoder 200 as shown and described with respect to fig. 1-3. That is, in one possible implementation, the inter predictor 110 described with respect to fig. 2 may perform certain techniques described below when performing inter prediction during encoding of a block of video data. In another possible implementation, the inter predictor 210 described with respect to fig. 3 may perform certain techniques described below when performing inter prediction during decoding of a block of video data. Thus, reference to a generic "video encoder" or "video decoder" may include video encoder 100, video decoder 200, or another video encoding or encoding unit.

Fig. 4 is a schematic block diagram of an inter prediction module in an embodiment of the present application. The inter prediction module 121, for example, may include a motion estimation unit and a motion compensation unit. The relationship between PU and CU varies among different video compression codec standards. Inter prediction module 121 may partition the current CU into PUs according to a plurality of partition modes. For example, inter prediction module 121 may partition the current CU into PUs according to 2 nx 2N, 2 nx N, N x 2N, and nxn partition modes. In other embodiments, the current CU is the current PU, and is not limited.

The inter prediction module 121 may perform Integer Motion Estimation (IME) and then Fractional Motion Estimation (FME) for each of the PUs. When the inter prediction module 121 performs IME on the PU, the inter prediction module 121 may search one or more reference pictures for a reference block for the PU. After finding the reference block for the PU, inter prediction module 121 may generate a motion vector that indicates, with integer precision, a spatial displacement between the PU and the reference block for the PU. When the inter prediction module 121 performs FME on a PU, the inter prediction module 121 may refine a motion vector generated by performing IME on the PU. The motion vectors generated by performing FME on a PU may have sub-integer precision (e.g., 1/2 pixel precision, 1/4 pixel precision, etc.). After generating the motion vectors for the PU, inter prediction module 121 may use the motion vectors for the PU to generate a predictive image block for the PU.

In some possible implementations where the inter prediction module 121 signals the motion information of the PU at the decoding end using AMVP mode, the inter prediction module 121 may generate a list of candidate prediction motion vectors for the PU. The list of candidate predicted motion vectors may include one or more original candidate predicted motion vectors and one or more additional candidate predicted motion vectors derived from the original candidate predicted motion vectors. After generating the candidate prediction motion vector list for the PU, inter prediction module 121 may select a candidate prediction motion vector from the candidate prediction motion vector list and generate a Motion Vector Difference (MVD) for the PU. The MVD for the PU may indicate a difference between the motion vector indicated by the selected candidate prediction motion vector and a motion vector generated for the PU using the IME and FME. In these possible implementations, the inter prediction module 121 may output a candidate prediction motion vector index that identifies the position of the selected candidate prediction motion vector in the candidate prediction motion vector list. The inter prediction module 121 may also output the MVD of the PU. In fig. 6, a possible implementation of the Advanced Motion Vector Prediction (AMVP) mode in the embodiment of the present application is described in detail below.

In addition to generating motion information for the PUs by performing IME and FME on the PUs, inter prediction module 121 may also perform a Merge (Merge) operation on each of the PUs. When inter prediction module 121 performs a merge operation on a PU, inter prediction module 121 may generate a list of candidate prediction motion vectors for the PU. The list of candidate predictive motion vectors for the PU may include one or more original candidate predictive motion vectors and one or more additional candidate predictive motion vectors derived from the original candidate predictive motion vectors. The original candidate predicted motion vectors in the list of candidate predicted motion vectors may include one or more spatial candidate predicted motion vectors and temporal candidate predicted motion vectors. The spatial candidate prediction motion vector may indicate motion information of other PUs in the current picture. The temporal candidate prediction motion vector may be based on motion information of a corresponding PU that is different from the current picture. The temporal candidate prediction motion vector may also be referred to as Temporal Motion Vector Prediction (TMVP).

After generating the candidate prediction motion vector list, the inter prediction module 121 may select one of the candidate prediction motion vectors from the candidate prediction motion vector list. Inter prediction module 121 may then generate a predictive image block for the PU based on the reference block indicated by the motion information of the PU. In merge mode, the motion information of the PU may be the same as the motion information indicated by the selected candidate prediction motion vector. FIG. 5, described below, illustrates an exemplary flow diagram for Merge.

After generating the predictive image block for the PU based on the IME and FME and the predictive image block for the PU based on the merge operation, the inter prediction module 121 may select either the predictive image block generated by the FME operation or the predictive image block generated by the merge operation. In some possible implementations, the inter prediction module 121 may select the predictive image block for the PU based on rate-distortion cost analysis of the predictive image block generated by the FME operation and the predictive image block generated by the merge operation.

After inter prediction module 121 has selected the predictive tiles of PUs generated by partitioning the current CU according to each of the partition modes (in some implementations, after coding tree unit CTU is divided into CUs, it is not further divided into smaller PUs, at which point the PUs are equivalent to CUs), inter prediction module 121 may select the partition mode for the current CU. In some implementations, the inter-prediction module 121 may select the partitioning mode for the current CU based on a rate-distortion cost analysis of selected predictive tiles of the PU that are generated by partitioning the current CU according to each of the partitioning modes. Inter prediction module 121 may output the predictive image blocks associated with PUs belonging to the selected partition mode to residual generation module 102. Inter prediction module 121 may output syntax elements indicating motion information for PUs belonging to the selected partition mode to entropy encoding module 116.

In the diagram of fig. 4, the inter-frame prediction module 121 includes IME modules 180A-180N (collectively referred to as "IME module 180"), FME modules 182A-182N (collectively referred to as "FME module 182"), merging modules 184A-184N (collectively referred to as "merging modules 184"), PU mode decision modules 186A-186N (collectively referred to as "PU mode decision modules 186"), and a CU mode decision module 188 (which may also include performing a mode decision process from the CTU to the CU).

The IME module 180, FME module 182, and merge module 184 may perform IME operations, FME operations, and merge operations on PUs of the current CU. The inter prediction module 121 is illustrated in the schematic diagram of fig. 4 as including a separate IME module 180, FME module 182, and merge module 184 for each PU of each partition mode of the CU. In other possible implementations, the inter prediction module 121 does not include a separate IME module 180, FME module 182, and merging module 184 for each PU of each partition mode of the CU.

As illustrated in the schematic diagram of fig. 4, IME module 180A, FME module 182A and merge module 184A may perform IME operations, FME operations, and merge operations on PUs generated by partitioning CUs according to a2 nx 2N partitioning mode. The PU mode decision module 186A may select one of the predictive image blocks generated by the IME module 180A, FME module 182A and the merge module 184A.

IME module 180B, FME module 182B and merge module 184B may perform IME, FME, and merge operations on the left PU resulting from partitioning the CU according to the nx2N partitioning mode. The PU mode decision module 186B may select one of the predictive image blocks generated by the IME module 180B, FME module 182B and the merge module 184B.

IME module 180C, FME module 182C and merge module 184C may perform IME, FME, and merge operations on the right PU resulting from partitioning the CU according to the nx2N partitioning mode. The PU mode decision module 186C may select one of the predictive image blocks generated by the IME module 180C, FME module 182C and the merge module 184C.

IME module 180N, FME module 182N and merge module 184 may perform IME, FME, and merge operations on the bottom right PU resulting from partitioning the CU according to an nxn partitioning mode. The PU mode decision module 186N may select one of the predictive image blocks generated by the IME module 180N, FME module 182N and the merge module 184N.

The PU mode decision module 186 may select a predictive tile based on rate-distortion cost analysis of a plurality of possible predictive tiles and select the predictive tile that provides the best rate-distortion cost for a given decoding scenario. For example, for bandwidth-limited applications, the PU mode decision module 186 may bias towards selecting predictive tiles that increase the compression ratio, while for other applications, the PU mode decision module 186 may bias towards selecting predictive tiles that increase the reconstructed video quality. After PU mode decision module 186 selects the predictive tiles for the PUs of the current CU, CU mode decision module 188 selects the partition mode for the current CU and outputs the predictive tiles and motion information for the PUs belonging to the selected partition mode.

Fig. 5 is an exemplary flowchart of the merge mode in the embodiment of the present application. A video encoder, such as video encoder 100, may perform merge operation 200. In other possible implementations, the video encoder may perform a merge operation that is different from merge operation 200. For example, in other possible implementations, the video encoder may perform a merge operation, where the video encoder performs more, fewer, or different steps than merge operation 200. In other possible implementations, the video encoder may perform the steps of the merge operation 200 in a different order or in parallel. The encoder may also perform a merge operation 200 on PUs encoded in skip mode.

After the video encoder begins the merge operation 200, the video encoder may generate a list of candidate prediction motion vectors for the current PU (202). The video encoder may generate the candidate prediction motion vector list for the current PU in various ways. For example, the video encoder may generate the list of candidate prediction motion vectors for the current PU according to one of the example techniques described below with respect to fig. 8-12.

As previously described, the candidate prediction motion vector list for the current PU may include temporal candidate prediction motion vectors. The temporal candidate prediction motion vector may indicate motion information of a temporally corresponding (co-located) PU. The co-located PU may be spatially in the same position in the image frame as the current PU, but in the reference image instead of the current image. The present application may refer to a reference picture that includes temporally corresponding PUs as a dependent reference picture. The present application may refer to a reference picture index of an associated reference picture as an associated reference picture index. As described previously, the current picture may be associated with one or more reference picture lists (e.g., list 0, list 1, etc.). The reference picture index may indicate a reference picture by indicating a position in a reference picture list of the reference picture. In some possible implementations, the current picture may be associated with a combined reference picture list.

In some video encoders, the relevant reference picture index is the reference picture index of the PU that encompasses the reference index source location associated with the current PU. In these video encoders, the reference index source location associated with the current PU is adjacent to the left of the current PU or adjacent above the current PU. In this application, a PU may "cover" a particular location if the image block associated with the PU includes the particular location. In these video encoders, if a reference index source location is not available, the video encoder may use a reference picture index of zero.

However, the following examples may exist: the reference index source location associated with the current PU is within the current CU. In these examples, a PU that covers the reference index source location associated with the current PU may be deemed available if the PU is above or to the left of the current CU. However, the video encoder may need to access motion information of another PU of the current CU in order to determine a reference picture that contains the co-located PU. Thus, these video encoders may use the motion information (i.e., reference picture indices) of PUs belonging to the current CU to generate temporal candidate prediction motion vectors for the current PU. In other words, these video encoders may generate temporal candidate prediction motion vectors using motion information of PUs belonging to the current CU. Thus, the video encoder may not be able to generate candidate prediction motion vector lists for the current PU and the PU that encompasses the reference index source location associated with the current PU in parallel.

In accordance with the techniques of this application, a video encoder may explicitly set a relevant reference picture index without referring to the reference picture index of any other PU. This may enable the video encoder to generate candidate prediction motion vector lists for the current PU and other PUs of the current CU in parallel. Because the video encoder explicitly sets the relevant reference picture index, the relevant reference picture index is not based on the motion information of any other PU of the current CU. In some possible implementations where the video encoder explicitly sets the relevant reference picture index, the video encoder may always set the relevant reference picture index to a fixed predefined preset reference picture index (e.g., 0). In this way, the video encoder may generate a temporal candidate prediction motion vector based on motion information of a co-located PU in a reference frame indicated by a preset reference picture index, and may include the temporal candidate prediction motion vector in a candidate prediction motion vector list of the current CU.

In a possible implementation where the video encoder explicitly sets the relevant reference picture index, the video encoder may explicitly signal the relevant reference picture index in a syntax structure (e.g., a picture header, a slice header, an APS, or another syntax structure). In this possible implementation, the video encoder may signal the decoding end the relevant reference picture index for each LCU (i.e., CTU), CU, PU, TU, or other type of sub-block. For example, a video encoder may signal: the associated reference picture index for each PU of the CU is equal to "1".

In some possible implementations, the relevant reference picture index may be set implicitly rather than explicitly. In these possible implementations, the video encoder may generate each temporal candidate predictive motion vector in the list of candidate predictive motion vectors for a PU of the current CU using motion information for PUs in the reference picture indicated by reference picture indices for PUs covering locations outside the current CU, even if these locations are not strictly adjacent to the current PU.

After generating the list of candidate predictive motion vectors for the current PU, the video encoder may generate a predictive image block associated with a candidate predictive motion vector in the list of candidate predictive motion vectors (204). The video encoder may generate a predictive image block associated with the candidate predictive motion vector by determining motion information for the current PU based on the motion information of the indicated candidate predictive motion vector and then generating the predictive image block based on one or more reference blocks indicated by the motion information of the current PU. The video encoder may then select one of the candidate predictive motion vectors from the list of candidate predictive motion vectors (206). The video encoder may select the candidate prediction motion vector in various ways. For example, the video encoder may select one of the candidate predictive motion vectors based on a rate-distortion cost analysis for each of the predictive image blocks associated with the candidate predictive motion vectors.

After selecting the candidate predictive motion vector, the video encoder may output a candidate predictive motion vector index (208). The candidate predicted motion vector index may indicate a position in the candidate predicted motion vector list where the candidate predicted motion vector is selected. In some possible embodiments, the candidate prediction motion vector index may be denoted as "merge _ idx".

Fig. 6 is an exemplary flowchart of an Advanced Motion Vector Prediction (AMVP) mode in an embodiment of the present application. A video encoder, such as video encoder 100, may perform AMVP operation 210.

After the video encoder begins AMVP operation 210, the video encoder may generate one or more motion vectors for the current PU (211). The video encoder may perform integer motion estimation and fractional motion estimation to generate motion vectors for the current PU. As described previously, the current picture may be associated with two reference picture lists (list 0 and list 1). If the current PU is uni-directionally predicted, the video encoder may generate a list 0 motion vector or a list 1 motion vector for the current PU. The list 0 motion vector may indicate a spatial displacement between the image block of the current PU and a reference block in a reference picture in list 0. The list 1 motion vector may indicate a spatial displacement between the image block of the current PU and a reference block in a reference picture in list 1. If the current PU is bi-predicted, the video encoder may generate a list 0 motion vector and a list 1 motion vector for the current PU.

After generating the one or more motion vectors for the current PU, the video encoder may generate a predictive image block for the current PU (212). The video encoder may generate a predictive image block for the current PU based on one or more reference blocks indicated by one or more motion vectors for the current PU.

In addition, the video encoder may generate a candidate prediction motion vector list for the current PU (213). The video decoder may generate the candidate prediction motion vector list for the current PU in various ways. For example, the video encoder may generate the list of candidate prediction motion vectors for the current PU according to one or more of the possible implementations described below with respect to fig. 8-12. In some possible embodiments, when the video encoder generates the candidate prediction motion vector list in the AMVP operation 210, the candidate prediction motion vector list may be limited to two candidate prediction motion vectors. In contrast, when the video encoder generates the candidate prediction motion vector list in the merge operation, the candidate prediction motion vector list may include more candidate prediction motion vectors (e.g., five candidate prediction motion vectors).

After generating the candidate predictive motion vector list for the current PU, the video encoder may generate one or more Motion Vector Differences (MVDs) for each candidate predictive motion vector in the candidate predictive motion vector list (214). The video encoder may generate a motion vector difference for the candidate prediction motion vector by determining a difference between the motion vector indicated by the candidate prediction motion vector and a corresponding motion vector of the current PU.

If the current PU is uni-directionally predicted, the video encoder may generate a single MVD for each candidate prediction motion vector. If the current PU is bi-predicted, the video encoder may generate two MVDs for each candidate prediction motion vector. The first MVD may indicate a difference between a motion vector of the candidate prediction motion vector and a list 0 motion vector of the current PU. The second MVD may indicate a difference between a motion vector of the candidate prediction motion vector and a list 1 motion vector of the current PU.

The video encoder may select one or more of the candidate predicted motion vectors from the list of candidate predicted motion vectors (215). The video encoder may select one or more candidate predictive motion vectors in various ways. For example, the video encoder may select a candidate predictive motion vector that matches the associated motion vector of the motion vector to be encoded with the least error, which may reduce the number of bits required to represent the motion vector difference for the candidate predictive motion vector.

After selecting the one or more candidate predictive motion vectors, the video encoder may output one or more reference picture indices for the current PU, one or more candidate predictive motion vector indices, and one or more motion vector differences for the one or more selected candidate predictive motion vectors (216).

In examples where the current picture is associated with two reference picture lists (list 0 and list 1) and the current PU is uni-directionally predicted, the video encoder may output either the reference picture index for list 0 ("ref _ idx _ 10") or the reference picture index for list 1 ("ref _ idx _ 11"). The video encoder may also output a candidate predictive motion vector index ("mvp _10_ flag") indicating a position of a selected candidate predictive motion vector for the list 0 motion vector of the current PU in the candidate predictive motion vector list. Alternatively, the video encoder may output a candidate predictive motion vector index ("mvp _11_ flag") indicating a position of a selected candidate predictive motion vector for the list 1 motion vector of the current PU in the candidate predictive motion vector list. The video encoder may also output the MVD for the list 0 motion vector or the list 1 motion vector for the current PU.

In an example where the current picture is associated with two reference picture lists (list 0 and list 1) and the current PU is bi-directionally predicted, the video encoder may output a reference picture index for list 0 ("ref _ idx _ 10") and a reference picture index for list 1 ("ref _ idx _ 11"). The video encoder may also output a candidate predictive motion vector index ("mvp _10_ flag") indicating a position of a selected candidate predictive motion vector for the list 0 motion vector of the current PU in the candidate predictive motion vector list. In addition, the video encoder may output a candidate predictive motion vector index ("mvp _11_ flag") indicating a position of a selected candidate predictive motion vector for the list 1 motion vector of the current PU in the candidate predictive motion vector list. The video encoder may also output an MVD for the list 0 motion vector for the current PU and an MVD for the list 1 motion vector for the current PU.

Fig. 7 is an exemplary flowchart of motion compensation performed by a video decoder (e.g., video decoder 200) in an embodiment of the present application.

When the video decoder performs motion compensation operation 220, the video decoder may receive an indication of the selected candidate predictive motion vector for the current PU (222). For example, the video decoder may receive a candidate predictive motion vector index indicating a position of the selected candidate predictive motion vector within a candidate predictive motion vector list of the current PU.

The video decoder may receive a first candidate prediction motion vector index and a second candidate prediction motion vector index if the motion information of the current PU is encoded using AMVP mode and the current PU is bi-directionally predicted. The first candidate predicted motion vector index indicates the position in the candidate predicted motion vector list of the selected candidate predicted motion vector for the list 0 motion vector of the current PU. The second candidate predicted motion vector index indicates a position in the candidate predicted motion vector list of the selected candidate predicted motion vector for the list 1 motion vector of the current PU. In some possible implementations, a single syntax element may be used to identify two candidate predictive motion vector indices.

In addition, the video decoder may generate a candidate prediction motion vector list for the current PU (224). The video decoder may generate this list of candidate prediction motion vectors for the current PU in various ways. For example, the video decoder may use the techniques described below with reference to fig. 8-12 to generate a list of candidate prediction motion vectors for the current PU. When the video decoder generates a temporal candidate prediction motion vector for the candidate prediction motion vector list, the video decoder may explicitly or implicitly set a reference picture index that identifies the reference picture that includes the co-located PU, as described above with respect to fig. 5.

After generating the candidate predictive motion vector list for the current PU, the video decoder may determine motion information for the current PU based on motion information indicated by one or more selected candidate predictive motion vectors in the candidate predictive motion vector list for the current PU (225). For example, if the motion information of the current PU is encoded using merge mode, the motion information of the current PU may be the same as the motion information indicated by the selected candidate prediction motion vector. If the motion information of the current PU is encoded using AMVP mode, the video decoder may reconstruct one or more motion vectors of the current PU using one or more motion vectors indicated by the or the selected candidate predicted motion vector and one or more MVDs indicated in the codestream. The reference picture index and the prediction direction identification of the current PU may be the same as the reference picture index and the prediction direction identification of the one or more selected candidate predictive motion vectors. After determining the motion information for the current PU, the video decoder may generate a predictive picture block for the current PU based on one or more reference blocks indicated by the motion information of the current PU (226).

Fig. 8 is an exemplary diagram of a Coding Unit (CU) and its associated neighboring tiles in an embodiment of the present application, illustrating a CU250 and exemplary candidate predicted motion vector locations 252A-252E associated with CU 250. Candidate predicted motion vector locations 252A-252E may be collectively referred to herein as candidate predicted motion vector locations 252. Candidate predicted motion vector position 252 represents a spatial candidate predicted motion vector in the same image as CU 250. Candidate predicted motion vector position 252A is located to the left of CU 250. Candidate predicted motion vector position 252B is located above CU 250. Candidate predicted motion vector position 252C is located to the upper right of CU 250. Candidate predicted motion vector position 252D is located to the lower left of CU 250. Candidate predicted motion vector position 252E is located at the top left of CU 250. Fig. 8 is a schematic implementation to provide a way in which the inter prediction module 121 and the motion compensation module 162 may generate a list of candidate prediction motion vectors. Implementations will be explained below with reference to inter prediction module 121, but it should be understood that motion compensation module 162 may implement the same techniques, and thus generate the same list of candidate prediction motion vectors.

Fig. 9 is an exemplary flowchart for constructing a candidate predicted motion vector list in the embodiment of the present application. The technique of fig. 9 will be described with reference to a list including five candidate predicted motion vectors, although the techniques described herein may also be used with lists having other sizes. The five candidate predicted motion vectors may each have an index (e.g., 0 to 4). The technique of fig. 9 will be described with reference to a general video decoder. A general video decoder may illustratively be a video encoder (e.g., video encoder 100) or a video decoder (e.g., video decoder 200).

To reconstruct the candidate prediction motion vector list according to the embodiment of fig. 9, the video decoder first considers the four spatial candidate prediction motion vectors (902). The four spatial candidate predicted motion vectors may include candidate predicted

motion vector positions

252A, 252B, 252C, and 252D. The four spatial candidate predicted motion vectors correspond to motion information of four PUs in the same picture as the current CU (e.g., CU 250). The video decoder may consider the four spatial candidate predictive motion vectors in the list in a particular order. For example, candidate predicted motion vector location 252A may be considered first. Candidate predicted motion vector position 252A may be assigned to index 0 if candidate predicted motion vector position 252A is available. If candidate predicted motion vector position 252A is not available, the video decoder may not include candidate predicted motion vector position 252A in the candidate predicted motion vector list. Candidate predicted motion vector positions may not be available for various reasons. For example, if the candidate predicted motion vector position is not within the current picture, the candidate predicted motion vector position may not be available. In another possible implementation, the candidate predicted motion vector position may not be available if the candidate predicted motion vector position is intra predicted. In another possible implementation, the candidate predicted motion vector position may not be available if it is in a different slice than the current CU.

After considering candidate predicted motion vector position 252A, the video decoder may next consider candidate predicted motion vector position 252B. If candidate predicted motion vector position 252B is available and different from candidate predicted motion vector position 252A, the video decoder may add candidate predicted motion vector position 252B to the candidate predicted motion vector list. In this particular context, the terms "same" and "different" refer to motion information associated with candidate predicted motion vector positions. Thus, two candidate predicted motion vector positions are considered to be the same if they have the same motion information and are considered to be different if they have different motion information. If candidate predicted motion vector position 252A is not available, the video decoder may assign candidate predicted motion vector position 252B to index 0. If candidate predicted motion vector position 252A is available, the video decoder may assign candidate predicted motion vector position 252 to index 1. If candidate predicted motion vector location 252B is not available or is the same as candidate predicted motion vector location 252A, the video decoder skips candidate predicted motion vector location 252B and does not include it in the candidate predicted motion vector list.

Candidate predicted motion vector position 252C is similarly considered by the video decoder for inclusion in the list. If candidate predicted motion vector position 252C is available and is not the same as candidate predicted

motion vector positions

252B and 252A, the video decoder assigns candidate predicted motion vector position 252C to the next available index. If candidate predicted motion vector position 252C is not available or is not different from at least one of candidate predicted

motion vector positions

252A and 252B, the video decoder does not include candidate predicted motion vector position 252C in the candidate predicted motion vector list. Next, the video decoder considers candidate predicted motion vector position 252D. If candidate predicted motion vector position 252D is available and is not the same as candidate predicted

motion vector positions

252A, 252B, and 252C, the video decoder assigns candidate predicted motion vector position 252D to the next available index. If candidate predicted motion vector position 252D is not available or is not different from at least one of candidate predicted

motion vector positions

252A, 252B, and 252C, the video decoder does not include candidate predicted motion vector position 252D in the candidate predicted motion vector list. The above embodiments generally describe that candidate predicted motion vectors 252A-252D are exemplarily considered for inclusion in the candidate predicted motion vector list, but in some implementations all candidate predicted motion vectors 252A-252D may be added to the candidate predicted motion vector list first, with duplicates later removed from the candidate predicted motion vector list.

After the video decoder considers the first four spatial candidate predicted motion vectors, the list of candidate predicted motion vectors may include four spatial candidate predicted motion vectors or the list may include less than four spatial candidate predicted motion vectors. If the list includes four spatial candidate predicted motion vectors (904, yes), the video decoder considers the temporal candidate predicted motion vector (906). The temporal candidate prediction motion vector may correspond to motion information of a co-located PU of an image different from the current image. If a temporal candidate predicted motion vector is available and different from the first four spatial candidate predicted motion vectors, the video decoder assigns the temporal candidate predicted motion vector to index 4. If the temporal candidate predicted motion vector is not available or is the same as one of the first four spatial candidate predicted motion vectors, the video decoder does not include the temporal candidate predicted motion vector in the list of candidate predicted motion vectors. Thus, after the video decoder considers the temporal candidate predicted motion vector (906), the candidate predicted motion vector list may include five candidate predicted motion vectors (the first four spatial candidate predicted motion vectors considered at block 902 and the temporal candidate predicted motion vectors considered at block 904) or may include four candidate predicted motion vectors (the first four spatial candidate predicted motion vectors considered at block 902). If the candidate predicted motion vector list includes five candidate predicted motion vectors (908, yes), the video decoder completes building the list.

If the candidate predicted motion vector list includes four candidate predicted motion vectors (908, no), the video decoder may consider a fifth spatial candidate predicted motion vector (910). The fifth spatial candidate predicted motion vector may, for example, correspond to candidate predicted motion vector position 252E. If the candidate predicted motion vector at location 252E is available and different from the candidate predicted motion vectors at

locations

252A, 252B, 252C, and 252D, the video decoder may add a fifth spatial candidate predicted motion vector to the list of candidate predicted motion vectors, the fifth spatial candidate predicted motion vector being assigned to index 4. If the candidate predicted motion vector at location 252E is not available or is not different from the candidate predicted motion vectors at candidate predicted

motion vector locations

252A, 252B, 252C, and 252D, the video decoder may not include the candidate predicted motion vector at location 252 in the list of candidate predicted motion vectors. Thus after considering the fifth spatial candidate predicted motion vector (910), the list may include five candidate predicted motion vectors (the first four spatial candidate predicted motion vectors considered at block 902 and the fifth spatial candidate predicted motion vector considered at block 910) or may include four candidate predicted motion vectors (the first four spatial candidate predicted motion vectors considered at block 902).

If the candidate predicted motion vector list includes five candidate predicted motion vectors (912, yes), the video decoder completes generating the candidate predicted motion vector list. If the candidate predicted motion vector list includes four candidate predicted motion vectors (912, no), the video decoder adds the artificially generated candidate predicted motion vector (914) until the list includes five candidate predicted motion vectors (916, yes).

If the list includes less than four spatial candidate predicted motion vectors after the video decoder considers the first four spatial candidate predicted motion vectors (904, no), the video decoder may consider a fifth spatial candidate predicted motion vector (918). The fifth spatial candidate predicted motion vector may, for example, correspond to candidate predicted motion vector position 252E. If the candidate predicted motion vector at location 252E is available and different from the candidate predicted motion vector already included in the candidate predicted motion vector list, the video decoder may add a fifth spatial candidate predicted motion vector to the candidate predicted motion vector list, the fifth spatial candidate predicted motion vector being assigned to the next available index. If the candidate predicted motion vector at location 252E is not available or is not different from one of the candidate predicted motion vectors already included in the candidate predicted motion vector list, the video decoder may not include the candidate predicted motion vector at location 252E in the candidate predicted motion vector list. The video decoder may then consider the temporal candidate prediction motion vector (920). If a temporal candidate predictive motion vector is available and different from a candidate predictive motion vector already included in the list of candidate predictive motion vectors, the video decoder may add the temporal candidate predictive motion vector to the list of candidate predictive motion vectors, the temporal candidate predictive motion vector being assigned to the next available index. The video decoder may not include the temporal candidate predictive motion vector in the list of candidate predictive motion vectors if the temporal candidate predictive motion vector is not available or is not different from one of the candidate predictive motion vectors already included in the list of candidate predictive motion vectors.

If the candidate predicted motion vector list includes five candidate predicted motion vectors after considering the fifth spatial candidate predicted motion vector (block 918) and the temporal candidate predicted motion vector (block 920) (922, yes), the video decoder completes generating the candidate predicted motion vector list. If the candidate predicted motion vector list includes less than five candidate predicted motion vectors (922, no), the video decoder adds the artificially generated candidate predicted motion vector (914) until the list includes five candidate predicted motion vectors (916, yes).

According to the techniques of this application, additional merge candidate predicted motion vectors may be artificially generated after the spatial candidate predicted motion vector and the temporal candidate predicted motion vector to fix the size of the merge candidate predicted motion vector list to a specified number of merge candidate predicted motion vectors (e.g., five in the possible implementation of fig. 9 above). The additional Merge candidate predicted motion vectors may include an exemplary combined bi-predictive Merge candidate predicted motion vector (candidate predicted motion vector 1), a scaled bi-predictive Merge candidate predicted motion vector (candidate predicted motion vector 2), and a zero vector Merge/AMVP candidate predicted motion vector (candidate predicted motion vector 3).

Fig. 10 is an exemplary diagram illustrating adding a combined candidate motion vector to a merge mode candidate prediction motion vector list according to an embodiment of the present application. The combined bi-predictive merge candidate prediction motion vector may be generated by combining the original merge candidate prediction motion vectors. Specifically, two of the original candidate predicted motion vectors (which have mvL0 and refIdxL0 or mvL1 and refIdxL1) may be used to generate bi-predictive merge candidate predicted motion vectors. In fig. 10, two candidate predicted motion vectors are included in the original merge candidate predicted motion vector list. The prediction type of one candidate predicted motion vector is list 0 uni-prediction and the prediction type of the other candidate predicted motion vector is list 1 uni-prediction. In this possible implementation, mvls 0_ a and ref0 are picked from list 0 and mvls 1_ B and ref0 are picked from list 1, and then bi-predictive merge candidate predictive motion vectors (which have mvL0_ a and ref0 in list 0 and mvL1_ B and ref0 in list 1) may be generated and checked for differences from the candidate predictive motion vectors already included in the candidate predictive motion vector list. If they are different, the video decoder may include the bi-predictive merge candidate prediction motion vector in the candidate prediction motion vector list.

Fig. 11 is an exemplary diagram illustrating adding a scaled candidate motion vector to a merge mode candidate prediction motion vector list according to an embodiment of the present disclosure. The scaled bi-predictive merge candidate prediction motion vector may be generated by scaling the original merge candidate prediction motion vector. Specifically, a candidate predicted motion vector (which may have mvLX and refIdxLX) from the original candidate predicted motion vectors may be used to generate the bi-predictive merge candidate predicted motion vector. In the possible embodiment of fig. 11, two candidate predicted motion vectors are included in the original merged candidate predicted motion vector list. The prediction type of one candidate predicted motion vector is list 0 uni-prediction and the prediction type of the other candidate predicted motion vector is list 1 uni-prediction. In this possible implementation, mvls 0_ a and ref0 can be picked from list 0, and ref0 can be copied to the reference index ref 0' in list 1. Then, mvL0 '_ a can be calculated by scaling mvL0_ a with ref0 and ref 0'. The scaling may depend on the POC distance. Then, bi-predictive merge candidate prediction motion vectors (which have mvls 0_ a and ref0 in list 0 and mvls 0 '_ a and ref 0' in list 1) may be generated and checked for duplication. If it is not duplicated, it may be added to the merge candidate prediction motion vector list.

Fig. 12 is an exemplary diagram illustrating adding a zero motion vector to the merge mode candidate prediction motion vector list in the embodiment of the present application. The zero vector merge candidate prediction motion vector may be generated by combining the zero vector and a reference index that may be referenced. If the zero vector candidate predicted motion vector is not duplicate, it may be added to the merge candidate predicted motion vector list. For each generated merge candidate prediction motion vector, the motion information may be compared to the motion information of the previous candidate prediction motion vector in the list.

In a possible implementation, the generated candidate predicted motion vector is added to the merge candidate predicted motion vector list if the newly generated candidate predicted motion vector is different from the candidate predicted motion vectors already included in the candidate predicted motion vector list. The process of determining whether a candidate predicted motion vector is different from a candidate predicted motion vector already included in the candidate predicted motion vector list is sometimes referred to as pruning. By pruning, each newly generated candidate predicted motion vector may be compared to existing candidate predicted motion vectors in the list. In some possible implementations, the pruning operation may include comparing one or more new candidate predicted motion vectors with candidate predicted motion vectors already in the list of candidate predicted motion vectors and new candidate predicted motion vectors that are not added as duplicates of candidate predicted motion vectors already in the list of candidate predicted motion vectors. In other possible embodiments, the pruning operation may include adding one or more new candidate predicted motion vectors to the list of candidate predicted motion vectors and later removing duplicate candidate predicted motion vectors from the list. It should be understood that in other possible embodiments, the trimming step described above may not be performed.

In the various possible embodiments of fig. 5-7, 9-12, etc., the spatial candidate prediction modes are exemplarily five positions from 252A to 252E shown in fig. 8, i.e., positions adjacent to the image block to be processed. On the basis of the various possible embodiments of fig. 5-7, fig. 9-12, etc., in some possible embodiments, the spatial candidate prediction mode may further include a position within a preset distance from the image block to be processed but not adjacent to the image block to be processed. Illustratively, such locations may be as shown at 252F through 252J in FIG. 13. It should be understood that fig. 13 is an exemplary schematic diagram of an encoding unit and an adjacent position image block associated therewith in the embodiment of the present application. The positions of the image blocks which are not adjacent to the image block to be processed and are completely reconstructed when the image block to be processed and the image block to be processed are positioned in the same image frame are within the range of the positions.

Proposals such as JVT-K0286, JVT-K0198 and JVT-K0339 propose a method for adding non-adjacent spatial fusion candidates (non-adjacent spatial fusion candidates) into a fusion candidate list, so that the number of fusion candidates in a merge/skip mode is increased, and the prediction efficiency is improved.

The fusion candidate list in JFET-K0286 is constructed as follows:

step 1: a spatial merge candidate (spatial merge candidate) spatially adjacent to the current block is added to the merge candidate list of the current block in the same way as in HEVC. The spatially adjacent spatial fusion candidates are the motion information of A, B, C, D, E blocks in fig. 14, and their order of addition to the fusion candidate list is A, B, C, D, E. The A, B, C, …, I, etc. blocks in FIG. 14 are all 4x4 blocks.

Step 2: a temporal merge candidate (temporal merge candidate) of the current block is added to the merge candidate list of the current block, which is the same method as in HEVC.

And step 3: adding a non-adjacent spatial fusion candidate (non-adjacencies spatial fusion candidate) that is not adjacent to the current block spatial domain to the fusion candidate list of the current block. The non-adjacent spatial fusion candidates are motion information of a1, B1, C1, D1, E1, a2, B2, C2, D2, E2, F, G, H, I blocks in fig. 14; the order in which non-adjacent spatial fusion candidates join the fusion candidate list is a1, B1, C1, D1, E1, F, G, H, I, A2, B2, C2, D2, E2. As a simplification, the jfet-K0286 proposal also proposes that non-adjacent spatial fusion candidates contain only motion information of blocks a1, B1, C1, D1, E1, a2, B2, C2, D2, E2.

And 4, step 4: other types of fusion candidates are added, such as bi-predictive fusion candidates (bi-predictive fusion candidates) and zero motion vector fusion candidates (zero motion vector fusion candidates).

Note that the length of the fusion candidate list is a predetermined value M, for example, 6, 8, 10, or the like. When the number of the fusion candidates in the fusion candidate list reaches a preset fixed value M, the construction of the fusion candidate list is completed, and the remaining fusion candidates are not added into the fusion candidate list any more. In addition, if a fusion candidate is the same as the existing fusion candidate in the fusion candidate list, the fusion candidate may not be added to the fusion candidate list, so as to avoid redundant information due to the repeated fusion candidates in the fusion candidate list.

More non-neighboring spatial fusion candidates were used in jfet-K0339, as shown in fig. 15. In fig. 15, blocks 1 to 5 are conventional spatial fusion candidates, and blocks 6 to 48 are non-neighboring spatial fusion candidates.

When decoding, if the current block uses merge/skip mode, the merge index (merge index) is analyzed from the code stream, and the merge candidate corresponding to the merge index (merge index) is selected from the merge candidate list constructed by the above method, so as to obtain the motion information of the current block. And performing motion compensation according to the motion information of the current block to obtain a predicted image of the current block. And adding the predicted image of the current block and the residual image of the current block to obtain a reconstructed image of the current block, thereby finishing the decoding of the current block.

The present application provides a method for constructing a fusion candidate list by using non-adjacent spatial fusion candidates, wherein the non-adjacent spatial fusion candidates used are from motion information of a row above or a column left of a coding tree unit in which a current block is located, and the number of non-adjacent spatial fusion candidates is small, so that the complexity of the method for constructing the fusion candidate list can be low.

The method of constructing a fusion candidate list proposed in the present application may involve a construction process of a fusion candidate list (merge candidate list) in a merge/skip mode. It should be understood that the method for constructing the fusion candidate list proposed herein can also be used for the construction process of other candidate prediction motion information sets including the AMVP technique.

The fusion candidate list construction method proposed by the present application can be applied to a video codec, for example, a video codec in a video communication system. The method for constructing the fusion candidate list can be applied to the motion information derivation of the inter-frame prediction mode, and the process of constructing the fusion candidate list at the encoding end and the decoding end can be the same.

In a first possible implementation manner of the fusion candidate list construction method according to the embodiment of the present application, the following 4 steps may be included. The length of the fusion candidate list may be N, that is, the number of fusion candidates included in the fusion candidate list obtained after the construction is completed is N, and for example, N is 5, 6, 8, or 10.

Step 1: and adding the spatial fusion candidate spatially adjacent to the current block into the fusion candidate list of the current block.

The current block may also be referred to as an image block to be processed, and the current block may be a coding unit or a prediction unit divided by a coding tree unit.

Specific implementations of this step can be found in the relevant sections of the present application, for example, reference may be made to a method of obtaining spatial fusion candidates in HEVC and adding the spatial fusion candidates to a fusion candidate list. The spatially adjacent spatial fusion candidates may be the motion information of A, B, C, D, E blocks in fig. 14, and their order of addition to the fusion candidate list may be A, B, C, D, E.

If the coordinate of the upper left corner of the current block is P0 ═ x0, y0, the width of the current block is W, and the height is H, then the coordinate of the upper left corner of the A block is PA ═ x0-4, y0+ H-4; the coordinate of the upper left corner of the block B is PB (x0+ W-4, y 0-4); the coordinate of the upper left corner of the C block is PC ═ x0+ W, y 0-4; the coordinate of the upper left corner of the block D is PD ═ x0-4, y0+ H; the coordinate of the upper left corner of the E block is PE ═ x0-4, y 0-4. Wherein an exemplary size of the A, B, C, D, E block is 4x 4.

Usually, the motion information is stored in the motion vector field in units of 4 × 4 blocks, and the motion information of a block can be found from the coordinates of a pixel covered by the block in the motion vector field. For example, if the coordinate of the upper left corner of a block is (x, y), then the coordinate of the corresponding element of the block in the motion vector field is (x > >2, y > >2), ">" indicates a right shift operation.

Step 2: and adding the time domain fusion candidate of the current block into a fusion candidate list of the current block.

Specific implementations of this step can be found in the relevant section of the present application, for example, reference may be made to a method of obtaining temporal fusion candidates in HEVC and adding the temporal fusion candidates to a fusion candidate list. The temporal fusion candidate for the current block may also be an ATMVP fusion candidate in the VTM reference software.

And step 3: and acquiring non-adjacent space fusion candidates from an upper adjacent block and a left adjacent block of the CTU where the current block is located according to the position of the current block, and adding the non-adjacent space fusion candidates into a fusion candidate list of the current block.

Note that the coordinates of the top left corner of the current block are P0 ═ x0, y0, the coordinates of the top left corner of the CTU where the current block is located are P1 ═ x1, y1, the width of the current block is W0, the height is H0, the width of the CTU is W1, and the height is H1. If the motion information is stored in 4 × 4 blocks as the basic unit, the upper neighboring block and the left neighboring block of the CTU may be 4 × 4 blocks.

When acquiring a non-adjacent spatial fusion candidate, as shown in fig. 16, a block a1 may be selected from left-adjacent blocks of the CTU in which the current block is located, and the motion information of a1 may be used as a non-adjacent spatial fusion candidate CA 1; one block B1 is selected from upper neighboring blocks of the CTU in which the current block is located, and the motion information of B1 is taken as a non-neighboring spatial fusion candidate CB 1.

The a1 block covers the coordinate PA1, and the coordinate PA1 is located in the area to the left of the current block that intersects the adjacent block to the left of the CTU. Alternatively, PA1 ═ x (can be said to be_A,y_A) When x_AIs less than x1, y_AIs greater than or equal to y0, and y_ALess than y0+ H0. For example: PA1 ═ x1-1, y0+ H0-1, or PA1 ═ x1-1, y0+ (H0/2), or PA1 ═ x1-1, y 0. Accordingly, when PA1 is (x1-1, y0+ H0-1), the coordinates of the upper left corner of a1 block may be (x1-4, y0+ H0-4).

The B1 block covers the coordinate PB1, and the coordinate PB1 is located in the area above the current block that intersects the upper neighboring block of the CTU. Alternatively, the notation PB1 ═ x (x) can be said_B,y_B) When y is_BIs less than y1, x_BIs greater than or equal to x0, and x_BLess than x0+ W0. For example: PB1 ═ x0+ W0-1, y1-1, or PB1 ═ x0+ (W0/2), y1-1, or PB1 ═ x0, y 1-1.

CA1, CB1 are added to the fusion candidate list of the current block, respectively, in the order of CA1, CB1 or CB1, CA 1.

The manner in which a fusion candidate is added to the fusion candidate list may be referred to in the art. For example, the motion information of the a1 block (or B1 block), i.e., the fusion candidate CA1 (or CB1), is compared with one or more fusion candidates already in the fusion candidate list, and if the fusion candidate CA1 (or CB1) is the same as the existing fusion candidate in the fusion candidate list, the fusion candidate CA1 (or CB1) is marked as unavailable; otherwise, the fusion candidate CA1 (or CB1) may be marked as available. If the fusion candidate CA1 (or CB1) is not available, then the fusion candidate CA1 (or CB1) does not add to the fusion candidate list; if the fusion candidate CA1 (or CB1) is available, the fusion candidate CA1 (or CB1) is added to the fusion candidate list.

As an improved method, the addition of CA1, CB1 to the fusion candidate list can also be adaptively selected according to the position of the current block in the CTU. For example, if (x0-x 1). ltoreq.y 0-y1, the order of addition to the list may be CA1 followed by CB 1; otherwise, the order of adding to the list may be CB1 first and CA1 second. As another example, if (x0-x 1). ltoreq.y 0-y1, the order of addition to the list may be CB1 first and CA1 later; otherwise, the order of adding to the list may be CA1 first and CB1 second.

And 4, step 4: adding other types of fusion candidates to the fusion candidate list of the current block. For example, a bi-predictive merge candidate (bi-predictive merge candidate) and a zero motion vector merge candidate (zero motion vector merge candidate) are added to the merge candidate list of the current block.

Specific implementations of this step can be found in the relevant sections of the present application. For example, reference may be made to a method in HEVC to obtain bi-predictive fusion candidates and zero motion vector fusion candidates and add them to the fusion candidate list.

When decoding the current block, if the current block uses a merge/skip (merge) mode, the decoder analyzes a merge index (merge index) from the code stream, and selects a merge candidate corresponding to the merge index (merge index) from a merge candidate list constructed by the method, thereby obtaining the motion information of the current block; then, according to the motion information of the current block, motion compensation is carried out to obtain a predicted image of the current block; and adding the predicted image of the current block and the residual image of the current block to obtain a reconstructed image of the current block, thereby completing the decoding of the current block.

It should be noted that the present application does not limit the use of other fusion candidates except the non-adjacent space fusion candidate obtained in step 3, the order in which each fusion candidate is added to the fusion candidate list, or the method for determining whether a fusion candidate can be added to the fusion candidate list. The foregoing related section of the present application describes, by way of example, possible implementations of other steps.

According to the embodiment of the application, the motion information is selected from the upper side adjacent block and the left side adjacent block of the CTU as the non-adjacent space fusion candidate, the accessed space domain motion information is less, and the complexity is lower.

In a second possible implementation manner of the fusion candidate list construction method according to the embodiment of the present application, 4 steps may be included. Step 1, step 2 and step 4 of these 4 steps are the same as or similar to step 1, step 2 and step 4 of the first possible embodiment, and are not described again here. Mainly step 3 of these 4 steps is described below.

And step 3: according to the position of the current block, 3 non-adjacent spatial fusion candidates are obtained from an upper adjacent block and a left adjacent block of the CTU of the current block, and the 3 non-adjacent spatial fusion candidates are added into a fusion candidate list of the current block.

For example, as shown in fig. 17, a block a2 is selected from left neighboring blocks of the CTU in which the current block is located, and the motion information of a2 is taken as a non-neighboring spatial fusion candidate CA 2; selecting a block B2 from upper neighboring blocks of the CTU where the current block is located, and taking motion information of B2 as a non-neighboring spatial fusion candidate CB 2; one block C2 is selected from the upper neighboring blocks of the CTU where the current block is located, the block C2 is located at the upper left of the current block, and the motion information of C2 is taken as a non-neighboring spatial fusion candidate CC 2. It should be understood that the block C2 may also be selected from a left-side neighboring block of the CTU where the current block is located, which is not limited by the embodiment of the present application.

Wherein the a2 block may cover the coordinate PA2, and the coordinate PA2 is located in the area where the left of the current block intersects the left neighboring block of the CTU. Alternatively, PA2 ═ x (can be said to be_A,y_A) When x_AIs less than x1, y_AIs greater than or equal to y0, and y_ALess than y0+ H0. For example: PA2 ═ x1-1, y0+ H0-1, or PA2 ═ x1-1, y0+ (H0/2), or PA2 ═ x1-1, y 0.

Wherein the B2 block may cover the coordinate PB2, and the coordinate PB2 is located at an area above the current block that intersects with an upper neighboring block of the CTU. Alternatively, the notation PB2 ═ x (x) can be said_B,y_B) When y is_BIs less than y1, x_BIs greater than or equal to x0, and x_BLess than x0+ W0. For example: PB2 ═ x0+ W0-1, y1-1, or PB2 ═ x0+ (W0/2), y1-1, or PB2 ═ x0, y 1-1.

Wherein the C2 block may cover the coordinate PC2, and the coordinate PC2 is located at an area where the upper left of the current block intersects with the left-neighboring block or the upper-neighboring block of the CTU. Alternatively, PC2 ═ x (can be said to be_C,y_C) When x_CIs less than x0 and xc is greater than or equal to x1-1, y_CIs less than y0, and y_CIs greater than or equal to y 1-1. For example: if (x0-x1) ≥ y0-y1, then PC2 ═ x0-y0+ y1-1, y 1-1); if (x0-x1)<(y0-y1), then PC2 ═ x1-1, y0-x0+ x 1-1.

CA2, CB2, CC2 may be added to the fusion candidate list of the current block, respectively. For example, CA2, CB2, CC2 are sequentially added to the fusion candidate list of the current block, or CC2, CA2, CB2 are sequentially added to the fusion candidate list of the current block. For example, if (x0-x 1). ltoreq.y 0-y1, the list is added in the order CA2, CB2, CC 2; otherwise, the order of adding the list is CB2, CA2, CC 2.

In a third possible implementation manner of the fusion candidate list construction method according to the embodiment of the present application, 4 steps may be included. Step 1, step 2 and step 4 of these 4 steps are the same as or similar to step 1, step 2 and step 4 of the first possible embodiment, and are not repeated here. The 3 rd step of these 4 steps will be mainly described below.

And step 3: according to the position of the current block, 5 non-adjacent space fusion candidates are obtained from an upper adjacent block and a left adjacent block of the CTU where the current block is located, and the 5 non-adjacent space fusion candidates are added into a fusion candidate list of the current block.

For example, as shown in fig. 18, a block A3 is selected from left neighboring blocks of the CTU in which the current block is located, taking the motion information of A3 as a non-neighboring spatial fusion candidate CA 3; selecting a block B3 from upper neighboring blocks of the CTU where the current block is located, and taking motion information of B3 as a non-neighboring spatial fusion candidate CB 3; selecting a block C3 from an upper neighboring block or a left neighboring block of the CTU in which the current block is located, the block C3 being located at the upper left of the current block, and using the motion information of C3 as a non-adjacent spatial fusion candidate CC 3; selecting a block D3 from left neighboring blocks of the CTU in which the current block is located, the block D3 being located at the lower left of the current block, and taking the motion information of D3 as a non-neighboring spatial fusion candidate CD 3; one block E3 is selected from upper neighboring blocks of the CTU where the current block is located, the block E3 is located at the upper right of the current block, and the motion information of E3 is used as the non-neighboring spatial fusion candidate CE 3.

Wherein the a3 block may cover the coordinate PA3, and the coordinate PA3 is located in the area where the left of the current block intersects the left neighboring block of the CTU. Alternatively, PA3 ═ x (can be said to be_A,y_A) When x_AIs less than x1, y_AIs greater than or equal to y0, and y_ALess than y0+ H0. For example, PA3 ═ x1-1, y0+ H0-1, or PA3 ═ x1-1, y0+ (H0/2), or PA3 ═ x1-1, y 0.

Wherein the B3 block may cover the coordinate PB3, and the coordinate PB3 is located at an area above the current block that intersects with an upper neighboring block of the CTU. Alternatively, the notation PB3 ═ x (x) can be said_B,y_B) When y is_BIs less than y1, x_BIs greater than or equal to x0, and x_BLess than x0+ W0. For example, PB3 ═ x0+ W0-1, y1-1, or PB3 ═ x0+ (W0/2), y1-1, or PB3 ═ x0, y 1-1.

Wherein the C3 block may cover the coordinate PC3, and the coordinate PC3 is located at an area where the upper left of the current block intersects with the upper neighboring block or the left neighboring block of the CTU. Alternatively, PC3 ═ x (can be said to be_C,y_C) When x_CIs less than x0 and xc is greater than or equal to x1-1, y_CIs less than y0, and y_CIs greater than or equal to y 1-1. For example: if (x0-x1) ≥ y0-y1, then PC3 ═ x0-y0+ y1-1, y1-1, if (x0-x1)<(y0-y1), then PC3 ═ x1-1, y0-x0+ x 1-1.

Wherein, the D3 block can cover the coordinate PD3, and the coordinate PD3 is located atThe region where the lower left of the current block intersects with the adjacent block on the left of the CTU. Alternatively, the term PD3 ═ x (x)_D,y_D) When x_DIs less than x1, y_DGreater than or equal to y0+ H0. For example PD3 ═ (x1-1, y0+ H0+ x0-x1), PD3 ═ (x1-1, y0+ H0+ ((x0-x1)/2)), or PD3 ═ x1-1, min (y0+ H0+ x0-x1, y1+ H1-1)); wherein min (a, b) is the smaller of a and b.

Wherein the E3 block may cover the coordinate PE3, and the coordinate PE3 is located in the area where the upper right of the current block intersects with the upper adjacent block of the CTU or, so to speak, let PE3 ═ x_E,y_E) When x_EIs greater than or equal to x0+ W0, y_EIs less than y 1. For example, PE3 ═ x0+ W0+ y0-y1, y1-1, or PE3 ═ x0+ W0+ ((y0-y1)/2), y1-1, or PE3 ═ min (x0+ W0+ y0-y1, x1+ W1-1), y1-1), or PE3 ═ min (x0+ W0+ y0-y1, x1+ (W1 × 3/2)), y 1-1).

CA3, CB3, CC3, CD3, CE3 were added to the fusion candidate list of the current block, respectively. For example, CA3, CB3, CC3, CD3, CE3 may be sequentially added to the fusion candidate list of the current block, or CA3, CB3, CC3, CE3, CD3 may be sequentially added to the fusion candidate list of the current block.

Alternatively, the order in which CA3, CB3, CC3, CD3, CE3 join the fusion candidate list of the current block may be determined in combination with the coordinates of the upper left corner of the current block and the coordinates of the upper left corner of the CTU in which the current block is located.

For example, if (x0-x 1). ltoreq.y 0-y1, the list is added in the order CA3, CB3, CC3, CD3, CE 3; otherwise, the order of addition to the list is CB3, CA3, CC3, CE3, CD 3.

If (x0-x1) < (W1/2) and (y0-y1) < (H1/2), the order of addition is CA3, CB3, CC3, CE3, CD 3;

if (x0-x1) ≥ W1/2 and (y0-y1) < (H1/2), the adding sequence is CB3, CC3, CE3, CA3 and CD 3;

if (x0-x1) < (W1/2) and (y0-y1) ≥ H1/2, the order of addition is CA3, CC3, CD3, CB3, CE 3;

if (x0-x1) ≥ W1/2 and (y0-y1) ≥ H1/2, the order of addition is CA3, CB3, CC3, CE3, CD 3.

According to the position of the current block, the non-adjacent space fusion candidates are obtained from the upper adjacent block and the left adjacent block of the CTU, and the non-adjacent space fusion candidates are added into the fusion candidate list of the current block. The scheme of the embodiment of the application has less spatial motion information and lower complexity, and can keep most of coding performance. In addition, according to the position of the current block, different orders are selected, and non-adjacent space fusion candidates are added into the fusion candidate list of the current block. When the identification of the fusion candidate in the fusion candidate list is subjected to variable length coding, in a possible implementation, the identification information (generally, the predicted motion information with higher probability of becoming the final coding/decoding) of the fusion candidate which is added into the fusion candidate list in advance is represented by a shorter code word, so that the coding efficiency can be improved.

The application also provides a prediction method of the motion information of the image block. Fig. 19 exemplarily shows a flow of a prediction method according to an embodiment of the present application. The prediction method 1900 may include S1901, S1902, and S1903.

S1901, determining at least one target pixel point having a preset position relationship with the to-be-processed image block, where the target pixel point is adjacent to a straight line on which an upper edge or a straight line on which a left edge of the coding tree unit on which the to-be-processed image block is located, and the target pixel point is located outside the coding tree unit.

The target pixel point is adjacent to the straight line of the upper edge of the coding tree unit where the image block to be processed is located, namely the difference between the vertical coordinate of the target pixel point and the vertical coordinate of the upper edge of the coding tree unit where the image block to be processed is located is smaller than or equal to a preset first threshold; the target pixel point is adjacent to the straight line where the left edge of the coding tree unit where the image block to be processed is located, that is, the difference between the horizontal coordinate of the target pixel point and the horizontal coordinate of the left edge of the coding tree unit where the image block to be processed is located is smaller than or equal to a preset second threshold. The first threshold and the second threshold may be the same.

One example of the first threshold value and the second threshold value is 1. At this time, the difference between the vertical coordinate of the target pixel and the vertical coordinate of the upper edge of the coding tree unit where the image block to be processed is located is equal to 1, and the difference between the horizontal coordinate of the target pixel and the horizontal coordinate of the left edge of the coding tree unit where the image block to be processed is located is equal to 1.

The target pixel point is located outside the coding tree unit, which means that the target pixel point is not a pixel point in the coding tree unit.

At least one of the embodiments of the present application may be understood as one or more, that is, one target pixel point or a plurality of target pixel points having a preset position relationship with the to-be-processed image block may be determined. When a plurality of target pixel points are determined, the plurality of target pixel points can belong to the same coding unit, or different pixel points in the plurality of target pixel points can belong to different coding units.

The following details will describe various preset position relationships between the target pixel point and the image block to be processed in detail.

S1902, add the motion information corresponding to the at least one target pixel point into the set of candidate motion information of the to-be-processed image block.

After at least one target pixel point is determined in S1901, motion information corresponding to each target pixel point in the at least one target pixel point may be determined.

The motion information corresponding to each target pixel point may include multiple information such as an inter-frame prediction direction, a reference frame, and a motion vector of the target pixel point.

Optionally, the set of candidate motion information added to the motion information corresponding to the at least one target pixel point may include motion information of a spatial neighboring block and/or a temporal related position block of a preset position of the image block to be processed.

For example, before S1902, the motion information of the spatial neighboring block and/or the temporal relative position block of the preset position of the to-be-processed image block may be added to the set of candidate motion information of the to-be-processed image block. It should be understood that, after S1902, the motion information of the spatial neighboring block and/or the temporal relative position block of the preset position of the to-be-processed tile block may be added to the set of candidate motion information of the to-be-processed tile block. This is not limited by the present application.

An example of the set of candidate motion information of the image block to be processed is a fusion candidate list, and accordingly, an example of the motion information corresponding to each target pixel point is motion information corresponding to one fusion candidate.

S1903, determine target motion information from the set of candidate motion information, where the target motion information is used to predict motion information of the to-be-processed image block.

It should be understood that the set of candidate motion information described herein refers to a set including motion information corresponding to the addition of the at least one target pixel point in S1902.

Compared with the prior art, the method selects the motion information from the upper side adjacent block and the left side adjacent block of the coding tree unit CTU as the candidate prediction motion information, and has less accessed spatial motion information and lower complexity.

In some possible implementations, the method may be used to decode the image block to be processed. Accordingly, the determining target motion information from the set of candidate motion information may include: analyzing the code stream to obtain identification information; and determining the target motion information according to the identification information.

One example of the identification information is a fusion serial number. In this case, one example of determining the target motion information according to the identification information is to select a fusion candidate corresponding to the fusion serial number from the set of candidate motion information, where the motion information corresponding to the fusion candidate is the target motion information.

In other possible implementations, the method may be used to encode the image block to be processed. Accordingly, the determining target motion information from the set of candidate motion information may include: and selecting the motion information with lower coding cost from the candidate motion information set as the target motion information.

One example of the calculation manner of the coding cost includes: and carrying out weighted average calculation on the bit number of the coded motion information and the distortion generated after the image block is coded by using the group of motion information.

For example, motion information having a coding cost less than or equal to a preset coding cost threshold may be selected as the target motion information from the set of candidate motion information.

For example, the motion information with the smallest coding cost may be selected from the set of candidate motion information as the target motion information.

In this possible implementation manner, after the determining to select motion information with a smaller coding cost from the set of candidate motion information as the target motion information, the method may further include: and encoding the identification information of the target motion information.

When the target motion information is motion information of a fusion candidate, one example of the identification information of the target motion information is a fusion index of the fusion candidate corresponding to the target motion information.

In some possible embodiments, the target pixel point may include a first pixel point, where a straight line where the first pixel point is located is parallel to a straight line where a left edge of the coding tree unit is located, the first pixel point is located on a line segment with a first foot and a second foot as end points, the first foot is a vertical projection point of a pixel point at an upper left corner of the to-be-processed image block on the straight line where the first pixel point is located, and the second foot is a vertical projection point of a pixel point at a lower left corner of the to-be-processed image block on the straight line where the first pixel point is located.

For example, as shown in fig. 22, the first foot M is a vertical projection point of a pixel point at the upper left corner of the image block to be processed on a straight line L1, and the straight line L1 is parallel to a straight line where the left edge of the code tree unit is located; the second vertical foot N is a vertical projection point of a pixel point at the lower left corner of the image block to be processed on the straight line L1, and the first pixel point is located on the line segment MN.

Optionally, the first pixel points at multiple different positions may be determined as target pixel points, and the first pixel points at multiple different positions may be pixel points in the same coding unit.

In some possible embodiments, the target pixel point may include a second pixel point, where a straight line where the second pixel point is located is parallel to a straight line where an upper edge of the coding tree unit is located, the second pixel point is located on a line segment with a third vertical foot and a fourth vertical foot as end points, the third vertical foot is a vertical projection point of a pixel point at the upper left corner of the to-be-processed image block on the straight line where the second pixel point is located, and the fourth vertical foot is a vertical projection point of a pixel point at the upper right corner of the to-be-processed image block on the straight line where the second pixel point is located.

For example, as shown in fig. 23, the third foot P is a vertical projection point of a pixel point at the upper left corner of the image block to be processed on a straight line L2, and the straight line L2 is parallel to a straight line where the upper edge of the coding tree unit is located; the fourth foot Q is a vertical projection point of a pixel point at the upper right corner of the image block to be processed on the straight line L2, and the second pixel point is located on the line segment PQ.

Optionally, a plurality of second pixel points at different positions may be determined as the target pixel point, and the plurality of second pixel points at different positions may be pixel points in the same encoding unit.

Optionally, the first pixel points at a plurality of different positions and the second pixel points at a plurality of different positions may be determined as target pixel points. The first pixel points at the plurality of different positions can belong to the same coding unit, and the second pixel points at the plurality of different positions can belong to the same coding unit.

In the embodiment of the present application, as shown in fig. 24, the coding tree unit where the current block is located may be divided into a left half portion and a right half portion. The left half part is a region which is vertically divided into two regions and is positioned on the left side, the right half part is a region which is positioned on the right side, the width of the two regions is half of the width of the coding tree unit, and the height of the two regions is the height of the coding tree unit.

It should be understood that the half described herein is not limited to an absolute half, but may be approximately a half. For example, in the case where the width of the coding tree unit cannot be equally divided, the widths of the two regions tend to be as much as half the width of the coding tree unit.

In this application, the left half may also be referred to as the left lower half, and the right half may also be referred to as the right lower half.

In some possible embodiments, for example, in a case that a first pixel point and a second pixel point are determined as target pixel points, when a pixel point at the top left corner of the to-be-processed image block is located in the top right half of the coding tree unit, a codeword length for representing motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the first pixel point in the set of candidate motion information; when the pixel point at the upper left corner of the image block to be processed is located in the lower left half of the coding tree unit, the length of the codeword used for representing the motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to the length of the codeword used for representing the motion information corresponding to the second pixel point in the set of candidate motion information.

When the pixel point at the upper left corner of the image block to be processed is located at the upper right half of the coding tree unit, because the distance between the image block to be processed and the second pixel point is smaller than the distance between the image block to be processed and the first pixel point, the correlation is higher, and the selected probability is higher, the length of the codeword used for representing the motion information corresponding to the second pixel point in the set of candidate motion information is smaller than or equal to the length of the codeword used for representing the motion information corresponding to the first pixel point in the set of candidate motion information, and the coding efficiency can be improved. When the pixel point at the upper left corner of the image block to be processed is located at the lower left half part of the coding tree unit, the coding efficiency can also be improved based on a similar principle.

In some possible embodiments, the target pixel point may include a third pixel point, where when a straight line where the third pixel point is located is parallel to a straight line where a left edge of the coding tree unit is located, the third pixel point is located on a line segment with the first foot and a reference intersection point as end points, and the reference intersection point is an intersection point of the straight line where the first pixel point is located and the straight line where the second pixel point is located; and when the straight line where the third pixel point is located is parallel to the straight line where the upper edge of the coding tree unit is located, the third pixel point is located on a line segment which takes the third vertical foot and the reference intersection point as end points.

For example, as shown in fig. 25, the reference intersection point is an intersection point O of a straight line L1 and a straight line L2. The same reference numerals in fig. 25 as those in fig. 23 and 24 denote the same meanings.

The third pixel point may be located on the line segment MO, or the third pixel point may be located on the line segment P. That is to say, one or more pixel points on the line segment MO may be determined as target pixel points, or one or more pixel points on the line segment PO may be determined as target pixel points.

When a plurality of pixel points on the line segment MO are determined as target pixel points, the plurality of pixel points can belong to the same coding unit; and determining a plurality of pixel points on the line PO as target pixel points, wherein the plurality of pixel points can belong to the same coding unit.

Optionally, when the target pixel point includes one or more third pixel points, one or more first pixel points may be further included, one or more second pixel points may be further included, or one or more first pixel points and one or more second pixel points may be further included.

In some possible embodiments, for example, in a case that the target pixel point includes a first pixel point, a second pixel point, and a third pixel point, a length of a codeword used to represent the motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to a length of a codeword used to represent the motion information corresponding to the third pixel point in the set of candidate motion information, and a length of a codeword used to represent the motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to a length of a codeword used to represent the motion information corresponding to the third pixel point in the set of candidate motion information. The code word allocation mode has higher coding efficiency.

In some possible embodiments, for example, in a case that the target pixel point includes a third pixel point, when a pixel point at an upper left corner of the to-be-processed image block is located at an upper right half of the coding tree unit, a straight line where the third pixel point is located is parallel to a straight line where an upper edge of the coding tree unit is located; when the pixel point at the upper left corner of the image block to be processed is located at the lower left half part of the coding tree unit, the straight line where the third pixel point is located is parallel to the straight line where the left edge of the coding tree unit is located.

Taking fig. 25 as an example, if the pixel point at the upper left corner of the to-be-processed image block is located at the upper right half of the coding tree unit where the pixel point is located, the third pixel point is located on the line segment OP; and if the pixel point at the upper left corner of the image block to be processed is positioned at the upper left half part of the coding tree unit where the pixel point is positioned, the third pixel point is positioned on the line segment OM.

In some possible embodiments, the target pixel point may include a fourth pixel point, where a straight line where the fourth pixel point is located is parallel to a straight line where a left edge of the coding tree unit is located, and the fourth pixel point is located on a ray that uses the second foot as an end point and that uses a direction from the first foot to the second foot as a direction.

As shown in fig. 25, the fourth pixel point may be located on the ray NK, that is, the target pixel point may include a pixel point on the ray NK.

Optionally, the target pixel point may further include a fourth pixel point under the condition that one or more of the first pixel point, the second pixel point, or the third pixel point is included.

In some possible embodiments, the target pixel further includes a fifth pixel, where a straight line where the fifth pixel is located is parallel to a straight line where an upper edge of the coding tree unit is located, and the fifth pixel is located on a ray that has the fourth foot as an end point and is directed from the third foot to the fourth foot.

As shown in fig. 25, the fifth pixel may be located on the ray QS, that is, the target pixel may include a pixel on the ray QS.

Optionally, the target pixel point may further include a fifth pixel point under the condition that the target pixel point includes one or more of the first pixel point, the second pixel point, the third pixel point, and the fourth pixel point.

In some possible embodiments, for example, in a case that the target pixel point includes both a fourth pixel point and a fifth pixel point, when a pixel point at the top left corner of the to-be-processed image block is located in the top right half of the coding tree unit, a codeword length for representing motion information corresponding to the fifth pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the fourth pixel point in the set of candidate motion information; when the pixel point at the upper left corner of the image block to be processed is located in the lower left half of the coding tree unit, the length of the codeword used for representing the motion information corresponding to the fourth pixel point in the set of candidate motion information is less than or equal to the length of the codeword used for representing the motion information corresponding to the fifth pixel point in the set of candidate motion information. The code word allocation mode has higher coding efficiency.

In the embodiment of the present application, as shown in fig. 26, the coding tree unit where the current block is located may be divided into an upper left quarter, a lower left quarter, an upper right quarter, and a lower right quarter. The upper left quarter refers to an upper left region of the four regions obtained by the quadtree division, the lower left quarter refers to a lower left region of the four regions obtained by the quadtree division, the upper right quarter refers to an upper right region of the four regions obtained by the quadtree division, and the lower right half refers to a lower right region of the four regions obtained by the quadtree division. The widths of the four regions obtained by the quadtree division are half of the width of the coding tree unit, and the heights of the four regions are half of the height of the coding tree unit.

It should be understood that the half described herein is not limited to an absolute half, but may be approximately a half. For example, in the case where the width and height of the coding tree unit cannot be equally divided, the width of each of the four regions tends to be half the width of the coding tree unit as much as possible, and the height of each of the four regions tends to be half the width of the coding tree unit as much as possible.

In some possible embodiments, when a pixel point at the top left corner of the to-be-processed image block is located in the top left quarter of the coding tree unit, a codeword length for representing motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the second pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the fifth pixel point in the set of candidate motion information, and the length of the codeword used to represent the motion information corresponding to the fifth pixel point in the set of candidate motion information is less than or equal to the length of the codeword used to represent the motion information corresponding to the fourth pixel point in the set of candidate motion information; when a pixel point at the upper left corner of the image block to be processed is located in the upper right quarter of the coding tree unit, a codeword length for representing motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the fifth pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the fifth pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the first pixel point in the set of candidate motion information, and the length of the codeword used to represent the motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to the length of the codeword used to represent the motion information corresponding to the fourth pixel point in the set of candidate motion information; when a pixel point at the upper left corner of the image block to be processed is located in the lower left quarter of the coding tree unit, a codeword length for representing motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the fourth pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the fourth pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the second pixel point in the set of candidate motion information, and the length of the codeword used to represent the motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to the length of the codeword used to represent the motion information corresponding to the fifth pixel point in the set of candidate motion information; when a pixel point at the upper left corner of the image block to be processed is located in the lower right quarter of the coding tree unit, a codeword length for representing motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the second pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the fifth pixel point in the set of candidate motion information, and the length of the codeword used to represent the motion information corresponding to the fifth pixel point in the set of candidate motion information is less than or equal to the length of the codeword used to represent the motion information corresponding to the fourth pixel point in the set of candidate motion information. The code word distribution mode has higher coding efficiency

In some possible embodiments, to the right in the horizontal direction, the positive direction of the abscissa of the rectangular coordinate system, to the positive direction of the ordinate of the rectangular coordinate system, to the lower in the vertical direction, to (x0, y0) as the coordinates of the top-left pixel of the tile to be processed, and to (x1, y1) as the coordinates of the top-left pixel of the coding tree unit, the coordinates of the first pixel include: (x1-1, y0+ H0-1), or (x1-1, y0+ H0/2), or (x1-1, y0), wherein H0 is the height of the image block to be processed.

Taking fig. 22 as an example, the first pixel point may be a pixel point corresponding to the point M, or the first pixel point may be a pixel point corresponding to the point N, or the first pixel point may be a pixel point corresponding to the end point of the line segment MN.

In some possible embodiments, the coordinates of the second pixel point include: (x0+ W0-1, y1-1), or (x0+ W0/2, y1-1), or (x0, y1-1), wherein W0 is the width of the image block to be processed.

Taking fig. 23 as an example, the second pixel point may be a pixel point corresponding to the point Q, or the second pixel point may be a pixel point corresponding to the point P, or the second pixel point may be a pixel point corresponding to the end point of the line segment PQ.

In a first possible implementation manner, the coordinates of the third pixel point include: (x0-y0+ y1-1, y1-1) or (x1-1, y0-x0+ x 1-1).

In some possible embodiments, when (x0-x1) > (y0-y1), the coordinates of the third pixel point include (x0-y0+ y1-1, y 1-1); when (x0-x1) < (y0-y1), the coordinates of the third pixel point include (x1-1, y0-x0+ x 1-1).

In some possible embodiments, the coordinates of the fourth pixel point include: (x1-1, y0+ H0+ x0-x1), or (x1-1, y0+ H0+ (x0-x1)/2), or (x1-1, min (y0+ H0+ x0-x1, y1+ H1-1)), wherein H1 is the height of the coding tree unit.

In some possible embodiments, the coordinates of the fifth pixel point include: (x0+ W0+ y0-y1, y1-1), or (x0+ W0+ (y0-y1)/2, y1-1), or (min (x0+ W0+ y0-y1, x1+ W1 x 3/2), y1-1), or (x0+ W0+ y0-y1, x1+ W1-1), y1-1), wherein W1 is the width of the coding tree unit.

A specific implementation manner of S1901 and S1902 in the embodiment of the present application may refer to an implementation manner of step 3 in the fusion candidate list construction method provided in the embodiment of the present application, and is not described herein again.

In the method for constructing a fusion candidate list provided in this embodiment of the present application, the current block in step 3 may be the to-be-processed image block in S1901 and S1902, the fusion candidate list in step 3 may be a set of candidate motion information of the to-be-processed image block in S1901 and S1902, and the fusion candidate in step 3 may be motion information corresponding to the target pixel point in S1901 and S1902.

Fig. 20 is a block diagram of an apparatus according to an embodiment of the present application, specifically:

an embodiment of the present application provides an apparatus 2000 for predicting motion information of an image block, including: an obtaining module 2001, configured to determine at least one target pixel point having a preset position relationship with an image block to be processed, where the target pixel point is adjacent to a straight line where an upper edge or a left edge of an encoding tree unit where the image block to be processed is located, and the target pixel point is located outside the encoding tree unit; a list module 2002, configured to add motion information corresponding to the at least one target pixel point to a set of candidate motion information of the image block to be processed; an indexing module 2003, configured to determine target motion information from the set of candidate motion information, where the target motion information is used to predict motion information of the to-be-processed image block.

In a first possible implementation manner, the target pixel point includes a first pixel point, where a straight line where the first pixel point is located is parallel to a straight line where a left edge of the coding tree unit is located, the first pixel point is located on a line segment with a first foot and a second foot as end points, the first foot is a vertical projection point of a pixel point at an upper left corner of the to-be-processed image block on the straight line where the first pixel point is located, and the second foot is a vertical projection point of a pixel point at a lower left corner of the to-be-processed image block on the straight line where the first pixel point is located.

In a second possible implementation manner, the target pixel further includes a second pixel, where a straight line where the second pixel is located is parallel to a straight line where an upper edge of the coding tree unit is located, the second pixel is located on a line segment with a third foot and a fourth foot as end points, the third foot is a vertical projection point of a pixel point at the top left corner of the to-be-processed image block on the straight line where the second pixel is located, and the fourth foot is a vertical projection point of a pixel point at the top right corner of the to-be-processed image block on the straight line where the second pixel is located.

In a third possible implementation manner, when a pixel point at the top left corner of the to-be-processed image block is located in the top right half of the coding tree unit, a codeword length for representing motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the first pixel point in the set of candidate motion information; when the pixel point at the upper left corner of the image block to be processed is located in the lower left half of the coding tree unit, the length of the codeword used for representing the motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to the length of the codeword used for representing the motion information corresponding to the second pixel point in the set of candidate motion information.

In a fourth possible implementation manner, the target pixel further includes a third pixel, where when a straight line where the third pixel is located is parallel to a straight line where a left edge of the coding tree unit is located, the third pixel is located on a line segment with the first foot and a reference intersection point as end points, and the reference intersection point is an intersection point of the straight line where the first pixel is located and the straight line where the second pixel is located; and when the straight line where the third pixel point is located is parallel to the straight line where the upper edge of the coding tree unit is located, the third pixel point is located on a line segment which takes the third vertical foot and the reference intersection point as end points.

In a fifth possible implementation, a length of a codeword used to represent the motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to a length of a codeword used to represent the motion information corresponding to the third pixel point in the set of candidate motion information, and a length of a codeword used to represent the motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to a length of a codeword used to represent the motion information corresponding to the third pixel point in the set of candidate motion information.

In a sixth possible implementation manner, when a pixel point at the top left corner of the to-be-processed image block is located in the top right half of the coding tree unit, a straight line where the third pixel point is located is parallel to a straight line where the top edge of the coding tree unit is located; when the pixel point at the upper left corner of the image block to be processed is located at the lower left half part of the coding tree unit, the straight line where the third pixel point is located is parallel to the straight line where the left edge of the coding tree unit is located.

In a seventh possible implementation manner, the target pixel further includes a fourth pixel, where a straight line where the fourth pixel is located is parallel to a straight line where a left edge of the coding tree unit is located, and the fourth pixel is located on a ray that takes the second foot as an end point and that takes the direction from the first foot to the second foot as a direction.

In an eighth feasible implementation manner, the target pixel further includes a fifth pixel, where a straight line where the fifth pixel is located is parallel to a straight line where an upper edge of the coding tree unit is located, and the fifth pixel is located on a ray that takes the fourth foot as an end point and that takes the direction from the third foot to the fourth foot as a direction.

In a ninth possible implementation manner, when a pixel point at the top left corner of the to-be-processed image block is located in the top right half of the coding tree unit, a codeword length for representing motion information corresponding to the fifth pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the fourth pixel point in the set of candidate motion information; when the pixel point at the upper left corner of the image block to be processed is located in the lower left half of the coding tree unit, the length of the codeword used for representing the motion information corresponding to the fourth pixel point in the set of candidate motion information is less than or equal to the length of the codeword used for representing the motion information corresponding to the fifth pixel point in the set of candidate motion information.

In a tenth possible implementation manner, when a pixel point at the top left corner of the to-be-processed image block is located in the top left quarter of the coding tree unit, a codeword length for representing motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the second pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the fifth pixel point in the set of candidate motion information, and the length of the codeword used to represent the motion information corresponding to the fifth pixel point in the set of candidate motion information is less than or equal to the length of the codeword used to represent the motion information corresponding to the fourth pixel point in the set of candidate motion information; when a pixel point at the upper left corner of the image block to be processed is located in the upper right quarter of the coding tree unit, a codeword length for representing motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the fifth pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the fifth pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the first pixel point in the set of candidate motion information, and the length of the codeword used to represent the motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to the length of the codeword used to represent the motion information corresponding to the fourth pixel point in the set of candidate motion information; when a pixel point at the upper left corner of the image block to be processed is located in the lower left quarter of the coding tree unit, a codeword length for representing motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the fourth pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the fourth pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the second pixel point in the set of candidate motion information, and the length of the codeword used to represent the motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to the length of the codeword used to represent the motion information corresponding to the fifth pixel point in the set of candidate motion information; when a pixel point at the upper left corner of the image block to be processed is located in the lower right quarter of the coding tree unit, a codeword length for representing motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the second pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the fifth pixel point in the set of candidate motion information, and the length of the codeword used to represent the motion information corresponding to the fifth pixel point in the set of candidate motion information is less than or equal to the length of the codeword used to represent the motion information corresponding to the fourth pixel point in the set of candidate motion information.

In an eleventh possible implementation manner, taking the horizontal direction to the right as the positive direction of the abscissa of the rectangular coordinate system, taking the vertical direction to the down as the positive direction of the ordinate of the rectangular coordinate system, taking (x0, y0) as the coordinates of the upper-left pixel of the tile to be processed, and taking (x1, y1) as the coordinates of the upper-left pixel of the coding tree unit, the coordinates of the first pixel include: (x1-1, y0+ H0-1), or (x1-1, y0+ H0/2), or (x1-1, y0), wherein H0 is the height of the image block to be processed.

In a twelfth possible implementation manner, the coordinates of the second pixel point include: (x0+ W0-1, y1-1), or (x0+ W0/2, y1-1), or (x0, y1-1), wherein W0 is the width of the image block to be processed.

In a thirteenth possible implementation manner, the coordinates of the third pixel point include: (x0-y0+ y1-1, y1-1) or (x1-1, y0-x0+ x 1-1).

In a fourteenth possible embodiment, when (x0-x1) > (y0-y1), the coordinates of the third pixel point include (x0-y0+ y1-1, y 1-1); when (x0-x1) < (y0-y1), the coordinates of the third pixel point include (x1-1, y0-x0+ x 1-1).

In a fifteenth possible implementation manner, the coordinates of the fourth pixel point include: (x1-1, y0+ H0+ x0-x1), or (x1-1, y0+ H0+ (x0-x1)/2), or (x1-1, min (y0+ H0+ x0-x1, y1+ H1-1)), wherein H1 is the height of the coding tree unit.

In a sixteenth possible implementation, the coordinates of the fifth pixel point include: (x0+ W0+ y0-y1, y1-1), or (x0+ W0+ (y0-y1)/2, y1-1), or (min (x0+ W0+ y0-y1, x1+ W1 x 3/2), y1-1), or (x0+ W0+ y0-y1, x1+ W1-1), y1-1), wherein W1 is the width of the coding tree unit.

In a seventeenth possible implementation, the listing module 2002 is further configured to: and adding the motion information of the spatial adjacent block and/or the time domain related position block at the preset position of the image block to be processed into the candidate motion information set of the image block to be processed.

In an eighteenth possible implementation, the apparatus 2000 is configured to decode the image block to be processed, and the indexing module 2003 is specifically configured to: analyzing the code stream to obtain identification information; and determining the target motion information according to the identification information.

In a nineteenth possible implementation, the apparatus 2000 is configured to encode the image block to be processed, and the indexing module 2003 is specifically configured to: and selecting the motion information with the minimum coding cost from the candidate motion information set as the target motion information.

In a twentieth possible implementation, the indexing module 2003 is further configured to: and encoding the identification information of the target motion information.

Fig. 21 is a block diagram showing a schematic configuration of a motion information prediction apparatus 2100 according to an embodiment of the present application. Specifically, the method comprises the following steps: a processor 2101 and a memory 2102 coupled to the processor; the processor 2101 is configured to perform the embodiment shown in FIG. 19 as well as various possible implementations.

Although particular aspects of the present application have been described with respect to video encoder 100 and video decoder 200, it should be understood that the techniques of the present application may be applied with many other video encoding and/or encoding units, processors, processing units, hardware-based encoding units such as encoders/decoders (CODECs), and the like. Moreover, it should be understood that the steps shown and described with respect to fig. 19 are provided as only possible implementations. That is, the steps shown in the possible implementation of FIG. 19 need not necessarily be performed in the order shown in FIG. 19, and fewer, additional, or alternative steps may be performed.

Moreover, it is to be understood that certain actions or events of any of the methods described herein can be performed in a different sequence, added, combined, or left out together (e.g., not all described actions or events are necessary for the practice of the methods), depending on the possible implementations. Further, in certain possible implementations, acts or events may be performed concurrently, e.g., via multi-threaded processing, interrupt processing, or multiple processors, rather than sequentially. Additionally, although specific aspects of the disclosure are described as being performed by a single module or unit for purposes of clarity, it should be understood that the techniques of this disclosure may be performed by a combination of units or modules associated with a video decoder.

In one or more possible implementations, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media, corresponding to tangible media such as data storage media, or communication media, including any medium that facilitates transfer of a computer program from one place to another, such as according to a communication protocol.

In this manner, the computer-readable medium illustratively may correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described herein. The computer program product may include a computer-readable medium.

Such computer-readable storage media may include, as a possible implementation and not limitation, RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that may be used to store desired code in the form of instructions or data structures and that may be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium.

It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are instead directed to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, as used herein, the term "processor" may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Likewise, the techniques may be fully implemented in one or more circuits or logic elements.

The techniques of this application may be implemented in a wide variety of devices or apparatuses, including wireless handsets, Integrated Circuits (ICs), or a collection of ICs (e.g., a chipset). Various components, modules, or units are described herein to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as described previously, the various units may be combined in a codec hardware unit or provided by an interoperative hardware unit (including one or more processors as described previously) in conjunction with a collection of suitable software and/or firmware.

It is to be understood that "/" in this application means "or". Wherein "and/or" may include three side-by-side schemes. For example, "a and/or B" may include: a, B, A and B.

The above description is only an exemplary embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for predicting motion information of an image block, comprising:

determining at least one target pixel point having a preset position relation with an image block to be processed, wherein the target pixel point is adjacent to a straight line where the upper edge of an encoding tree unit where the image block to be processed is located or a straight line where the left edge of the encoding tree unit is located, and the target pixel point is located outside the encoding tree unit;

adding the motion information corresponding to the at least one target pixel point into a set of candidate motion information of the image block to be processed;

and determining target motion information from the candidate motion information set, wherein the target motion information is used for predicting the motion information of the image block to be processed.

2. The method according to claim 1, wherein the target pixel point includes a first pixel point, wherein a straight line where the first pixel point is located is parallel to a straight line where a left edge of the coding tree unit is located, the first pixel point is located on a line segment with a first foot and a second foot as end points, the first foot is a vertical projection point of a pixel point at an upper left corner of the image block to be processed on the straight line where the first pixel point is located, and the second foot is a vertical projection point of a pixel point at a lower left corner of the image block to be processed on the straight line where the first pixel point is located.

3. The method according to claim 2, wherein the target pixel further comprises a second pixel, wherein a straight line of the second pixel is parallel to a straight line of an upper edge of the coding tree unit, the second pixel is located on a line segment with a third foot and a fourth foot as end points, the third foot is a vertical projection point of a pixel point at the upper left corner of the image block to be processed on the straight line of the second pixel, and the fourth foot is a vertical projection point of a pixel point at the upper right corner of the image block to be processed on the straight line of the second pixel.

4. The method according to claim 3, wherein when a pixel point in the top left corner of the image block to be processed is located in the top right half of the coding tree unit, the codeword length for representing the motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to the codeword length for representing the motion information corresponding to the first pixel point in the set of candidate motion information;

when the pixel point at the upper left corner of the image block to be processed is located in the lower left half of the coding tree unit, the length of the codeword used for representing the motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to the length of the codeword used for representing the motion information corresponding to the second pixel point in the set of candidate motion information.

5. The method according to claim 3 or 4, wherein the target pixel further comprises a third pixel, wherein when the straight line of the third pixel is parallel to the straight line of the left edge of the coding tree unit, the third pixel is located on a line segment with the first foot and a reference intersection point as end points, and the reference intersection point is the intersection point of the straight line of the first pixel and the straight line of the second pixel;

and when the straight line where the third pixel point is located is parallel to the straight line where the upper edge of the coding tree unit is located, the third pixel point is located on a line segment which takes the third vertical foot and the reference intersection point as end points.

6. The method of claim 5, wherein a length of a codeword used to represent the motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to a length of a codeword used to represent the motion information corresponding to the third pixel point in the set of candidate motion information, and wherein a length of a codeword used to represent the motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to a length of a codeword used to represent the motion information corresponding to the third pixel point in the set of candidate motion information.

7. The method according to claim 5 or 6, wherein when the pixel point at the top left corner of the to-be-processed image block is located at the top right half of the coding tree unit, the straight line where the third pixel point is located is parallel to the straight line where the top edge of the coding tree unit is located;

when the pixel point at the upper left corner of the image block to be processed is located at the lower left half part of the coding tree unit, the straight line where the third pixel point is located is parallel to the straight line where the left edge of the coding tree unit is located.

8. The method of any of claims 5 to 7, wherein the target pixel further comprises a fourth pixel, wherein a line on which the fourth pixel is located is parallel to a line on which a left edge of the coding tree unit is located, and wherein the fourth pixel is located on a ray that ends at the second foot and is directed from the first foot to the second foot.

9. The method of claim 8, wherein the target pixel further comprises a fifth pixel, wherein a line on which the fifth pixel is located is parallel to a line on which an upper edge of the coding tree unit is located, and wherein the fifth pixel is located on a ray that ends at the fourth foot and is directed from the third foot to the fourth foot.

10. The method according to claim 9, wherein when a pixel point in the top left corner of the image block to be processed is located in the top right half of the coding tree unit, a codeword length for representing the motion information corresponding to the fifth pixel point in the set of candidate motion information is less than or equal to a codeword length for representing the motion information corresponding to the fourth pixel point in the set of candidate motion information;

when the pixel point at the upper left corner of the image block to be processed is located in the lower left half of the coding tree unit, the length of the codeword used for representing the motion information corresponding to the fourth pixel point in the set of candidate motion information is less than or equal to the length of the codeword used for representing the motion information corresponding to the fifth pixel point in the set of candidate motion information.

11. The method according to claim 9, wherein when a pixel point at the top left corner of the image block to be processed is located in the top left quarter of the coding tree unit, a codeword length for representing the motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to a codeword length for representing the motion information corresponding to the second pixel point in the set of candidate motion information, and a codeword length for representing the motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to a codeword length for representing the motion information corresponding to the third pixel point in the set of candidate motion information, and a codeword length for representing the motion information corresponding to the third pixel point in the set of candidate motion information is less than or equal to a codeword length for representing the motion information corresponding to the fifth pixel point in the set of candidate motion information A codeword length, and the codeword length for representing the motion information corresponding to the fifth pixel point in the set of candidate motion information is less than or equal to the codeword length for representing the motion information corresponding to the fourth pixel point in the set of candidate motion information;

when a pixel point at the upper left corner of the image block to be processed is located in the upper right quarter of the coding tree unit, a codeword length for representing motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the fifth pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the fifth pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the first pixel point in the set of candidate motion information, and the length of the codeword used to represent the motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to the length of the codeword used to represent the motion information corresponding to the fourth pixel point in the set of candidate motion information;

when a pixel point at the upper left corner of the image block to be processed is located in the lower left quarter of the coding tree unit, a codeword length for representing motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the fourth pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the fourth pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the second pixel point in the set of candidate motion information, and the length of the codeword used to represent the motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to the length of the codeword used to represent the motion information corresponding to the fifth pixel point in the set of candidate motion information;

when a pixel point at the upper left corner of the image block to be processed is located in the lower right quarter of the coding tree unit, a codeword length for representing motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the second pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information, and a codeword length for representing motion information corresponding to the third pixel point in the set of candidate motion information is less than or equal to a codeword length for representing motion information corresponding to the fifth pixel point in the set of candidate motion information, and the length of the codeword used to represent the motion information corresponding to the fifth pixel point in the set of candidate motion information is less than or equal to the length of the codeword used to represent the motion information corresponding to the fourth pixel point in the set of candidate motion information.

12. The method according to any one of claims 2 to 11, wherein the coordinates of the top-left pixel of the tile to be processed are (x0, y0) and (x1, y1), and the coordinates of the first pixel comprise: (x1-1, y0+ H0-1), or (x1-1, y0+ H0/2), or (x1-1, y0), wherein H0 is the height of the image block to be processed.

13. The method of claim 12, wherein the coordinates of the second pixel point comprise: (x0+ W0-1, y1-1), or (x0+ W0/2, y1-1), or (x0, y1-1), wherein W0 is the width of the image block to be processed.

14. The method according to claim 12 or 13, wherein the coordinates of the third pixel point comprise: (x0-y0+ y1-1, y1-1) or (x1-1, y0-x0+ x 1-1).

15. The method of any one of claims 12 to 14, wherein when (x0-x1) > (y0-y1), the coordinates of the third pixel point include (x0-y0+ y1-1, y 1-1); when (x0-x1) < (y0-y1), the coordinates of the third pixel point include (x1-1, y0-x0+ x 1-1).

16. The method according to any one of claims 12 to 15, wherein the coordinates of the fourth pixel point comprise: (x1-1, y0+ H0+ x0-x1), or (x1-1, y0+ H0+ (x0-x1)/2), or (x1-1, min (y0+ H0+ x0-x1, y1+ H1-1)), wherein H1 is the height of the coding tree unit.

17. The method according to any one of claims 12 to 16, wherein the coordinates of the fifth pixel point comprise: (x0+ W0+ y0-y1, y1-1), or (x0+ W0+ (y0-y1)/2, y1-1), or (min (x0+ W0+ y0-y1, x1+ W1 x 3/2), y1-1), or (x0+ W0+ y0-y1, x1+ W1-1), y1-1), wherein W1 is the width of the coding tree unit.

18. The method according to any one of claims 1 to 17, further comprising, before adding the motion information corresponding to the at least one target pixel point to the set of candidate motion information of the image block to be processed, the following steps:

and adding the motion information of the spatial adjacent block and/or the time domain related position block at the preset position of the image block to be processed into the candidate motion information set of the image block to be processed.

19. The method according to any of claims 1 to 18, wherein the method is used for decoding the image block to be processed, wherein the determining target motion information from the set of candidate motion information comprises:

analyzing the code stream to obtain identification information;

and determining the target motion information according to the identification information.

20. The method according to any of claims 1 to 18, wherein the method is used for encoding the image block to be processed, and wherein the determining target motion information from the set of candidate motion information comprises:

and selecting the motion information with the minimum coding cost from the candidate motion information set as the target motion information.

21. The method of claim 20, wherein after selecting the motion information with the smallest coding cost from the set of candidate motion information as the target motion information, the method further comprises:

and encoding the identification information of the target motion information.

22. An apparatus for predicting motion information of an image block, comprising:

the image processing device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for determining at least one target pixel point which has a preset position relation with an image block to be processed, the target pixel point is adjacent to a straight line where the upper edge or the left edge of an encoding tree unit where the image block to be processed is located, and the target pixel point is located outside the encoding tree unit;

the list module is used for adding the motion information corresponding to the at least one target pixel point into a set of candidate motion information of the image block to be processed;

and the index module is used for determining target motion information from the candidate motion information set, wherein the target motion information is used for predicting the motion information of the image block to be processed.

23. The apparatus according to claim 22, wherein the target pixel point comprises a first pixel point, wherein a straight line of the first pixel point is parallel to a straight line of a left edge of the coding tree unit, the first pixel point is located on a line segment with a first foot and a second foot as end points, the first foot is a vertical projection point of a pixel point at an upper left corner of the image block to be processed on the straight line of the first pixel point, and the second foot is a vertical projection point of a pixel point at a lower left corner of the image block to be processed on the straight line of the first pixel point.

24. The apparatus according to claim 23, wherein the target pixel further comprises a second pixel, wherein a line on which the second pixel is located is parallel to a line on which an upper edge of the coding tree unit is located, the second pixel is located on a line segment with a third foot and a fourth foot as end points, the third foot is a vertical projection point of a pixel point at an upper left corner of the to-be-processed image block on the line on which the second pixel is located, and the fourth foot is a vertical projection point of a pixel point at an upper right corner of the to-be-processed image block on the line on which the second pixel is located.

25. The apparatus according to claim 24, wherein when a pixel point in the top left corner of the to-be-processed image block is located in the top right half of the coding tree unit, a codeword length for representing the motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to a codeword length for representing the motion information corresponding to the first pixel point in the set of candidate motion information;

26. The apparatus according to claim 24 or 25, wherein the target pixel further comprises a third pixel, and wherein when the straight line of the third pixel is parallel to the straight line of the left edge of the code tree unit, the third pixel is located on a line segment with the first foot and a reference intersection point as end points, and the reference intersection point is the intersection point of the straight line of the first pixel and the straight line of the second pixel;

27. The apparatus of claim 26, wherein a length of a codeword used for representing the motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to a length of a codeword used for representing the motion information corresponding to the third pixel point in the set of candidate motion information, and wherein a length of a codeword used for representing the motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to a length of a codeword used for representing the motion information corresponding to the third pixel point in the set of candidate motion information.

28. The apparatus according to claim 26 or 27, wherein when the pixel point at the top left corner of the to-be-processed image block is located at the top right half of the coding tree unit, the straight line where the third pixel point is located is parallel to the straight line where the top edge of the coding tree unit is located;

29. The apparatus of any of claims 26-28, wherein said target pixel further comprises a fourth pixel, wherein said fourth pixel is located on a line parallel to a line along the left edge of said coding tree unit, and wherein said fourth pixel is located on a ray that ends at said second foot and is directed from said first foot to said second foot.

30. The apparatus of claim 29, wherein the target pixel further comprises a fifth pixel, wherein a line on which the fifth pixel is located is parallel to a line on which an upper edge of the coding tree unit is located, and wherein the fifth pixel is located on a ray that ends at the fourth foot and is directed from the third foot to the fourth foot.

31. The apparatus according to claim 30, wherein when a pixel point in the top left corner of the to-be-processed image block is located in the top right half of the coding tree unit, a codeword length for representing the motion information corresponding to the fifth pixel point in the set of candidate motion information is less than or equal to a codeword length for representing the motion information corresponding to the fourth pixel point in the set of candidate motion information;

32. The apparatus of claim 30, wherein when a pixel point at the top left corner of the to-be-processed image block is located in the top left quarter of the coding tree unit, a codeword length for representing the motion information corresponding to the first pixel point in the set of candidate motion information is less than or equal to a codeword length for representing the motion information corresponding to the second pixel point in the set of candidate motion information, and a codeword length for representing the motion information corresponding to the second pixel point in the set of candidate motion information is less than or equal to a codeword length for representing the motion information corresponding to the third pixel point in the set of candidate motion information, and a codeword length for representing the motion information corresponding to the third pixel point in the set of candidate motion information is less than or equal to a codeword length for representing the motion information corresponding to the fifth pixel point in the set of candidate motion information A codeword length, and the codeword length for representing the motion information corresponding to the fifth pixel point in the set of candidate motion information is less than or equal to the codeword length for representing the motion information corresponding to the fourth pixel point in the set of candidate motion information;

33. The apparatus according to any one of claims 23 to 32, wherein the coordinates of the top left pixel of the tile to be processed are (x0, y0), and (x1, y1), and the coordinates of the first pixel comprise: (x1-1, y0+ H0-1), or (x1-1, y0+ H0/2), or (x1-1, y0), wherein H0 is the height of the image block to be processed.

34. The apparatus of claim 33, wherein the coordinates of the second pixel point comprise: (x0+ W0-1, y1-1), or (x0+ W0/2, y1-1), or (x0, y1-1), wherein W0 is the width of the image block to be processed.

35. The apparatus according to claim 33 or 34, wherein the coordinates of the third pixel point comprise: (x0-y0+ y1-1, y1-1) or (x1-1, y0-x0+ x 1-1).

36. The apparatus of any one of claims 33 to 35, wherein when (x0-x1) > (y0-y1), the coordinates of the third pixel point comprise (x0-y0+ y1-1, y 1-1); when (x0-x1) < (y0-y1), the coordinates of the third pixel point include (x1-1, y0-x0+ x 1-1).

37. The apparatus according to any one of claims 33 to 36, wherein the coordinates of the fourth pixel point comprise: (x1-1, y0+ H0+ x0-x1), or (x1-1, y0+ H0+ (x0-x1)/2), or (x1-1, min (y0+ H0+ x0-x1, y1+ H1-1)), wherein H1 is the height of the coding tree unit.

38. The apparatus according to any one of claims 33 to 37, wherein the coordinates of the fifth pixel point comprise: (x0+ W0+ y0-y1, y1-1), or (x0+ W0+ (y0-y1)/2, y1-1), or (min (x0+ W0+ y0-y1, x1+ W1 x 3/2), y1-1), or (x0+ W0+ y0-y1, x1+ W1-1), y1-1), wherein W1 is the width of the coding tree unit.

39. The apparatus of any one of claims 22 to 38, wherein the listing module is further configured to:

40. The apparatus according to any of the claims 22 to 39, wherein said apparatus is configured to decode said to-be-processed image block, and said indexing module is specifically configured to:

analyzing the code stream to obtain identification information;

41. The apparatus according to any of the claims 22 to 39, wherein the apparatus is configured to encode the to-be-processed image block, and the indexing module is specifically configured to:

42. The apparatus of claim 41, wherein the indexing module is further configured to:

and encoding the identification information of the target motion information.