CN111200735A

CN111200735A - Inter-frame prediction method and device

Info

Publication number: CN111200735A
Application number: CN201811578340.7A
Authority: CN
Inventors: 张娜; 陈旭; 郑建铧
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2018-11-19
Filing date: 2018-12-21
Publication date: 2020-05-26
Anticipated expiration: 2038-12-21
Also published as: WO2020103593A1; CN111200735B

Abstract

The embodiment of the application relates to a method and a device for inter-frame prediction, wherein the method comprises the following steps: determining the size of a basic prediction block in the image block to be processed according to the size reference information, wherein the size is used for determining the position of the basic prediction block in the image block to be processed; from this position, a first reference block and a second reference block of the basic prediction block are determined. Wherein, the left boundary of the first reference block is collinear with the left boundary of the basic prediction unit, the upper boundary of the second reference block is collinear with the upper boundary of the basic prediction unit, the first reference block is adjacent to the upper boundary of the image block to be processed, and the second reference block is adjacent to the left boundary of the image block to be processed; and performing weighted calculation on one or more of the motion vector corresponding to the first reference block, the motion vector corresponding to the second reference block and the motion vector corresponding to the original reference block with a preset position relation with the image block to be processed so as to obtain the motion vector corresponding to the basic prediction block.

Description

Inter-frame prediction method and device

The present application claims priority of chinese patent application entitled "a method and apparatus for inter-frame prediction" filed by the chinese patent office on 19/11/2018 with application number 201811377897.4, the entire contents of which are incorporated herein by reference.

Technical Field

The present application relates to the field of video encoding and decoding technologies, and in particular, to a method and an apparatus for inter-frame prediction.

Background

Digital video capabilities can be incorporated into a wide variety of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, Personal Digital Assistants (PDAs), laptop or desktop computers, tablet computers, electronic book readers, digital cameras, digital recording devices, digital media players, video gaming devices, video game consoles, cellular or satellite radio telephones (so-called "smart phones"), video teleconferencing devices, video streaming devices, and the like. Digital video devices implement video compression techniques such as those described in standards defined by the Moving Picture Experts Group (MPEG) -2, MPEG-4, international telecommunication union telecommunication standards division (ITU-T) h.263, ITU-T h.264/MPEG-4 part 10 Advanced Video Coding (AVC), the video coding standard h.265/High Efficiency Video Coding (HEVC) standard, and extensions of such standards. Video devices may transmit, receive, encode, decode, and/or store digital video information more efficiently by implementing such video compression techniques.

Video compression techniques perform spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video slice (i.e., a video frame or a portion of a video frame) may be partitioned into tiles, which may also be referred to as treeblocks, Coding Units (CUs), and/or coding nodes. An image block in a to-be-intra-coded (I) strip of an image is encoded using spatial prediction with respect to reference samples in neighboring blocks in the same image. An image block in a to-be-inter-coded (P or B) slice of an image may use spatial prediction with respect to reference samples in neighboring blocks in the same image or temporal prediction with respect to reference samples in other reference images. A picture may be referred to as a frame and a reference picture may be referred to as a reference frame.

Various video coding standards, including the HEVC standard, among others, propose predictive coding modes for image blocks, i.e. predicting a currently coded image block based on already coded image blocks. In intra-prediction mode, predicting a currently decoded image block based on one or more previously decoded neighboring blocks in the same image as the current block; in the inter prediction mode, a currently decoded image block is predicted based on already decoded blocks in a different image.

Currently, an inter prediction test is performed, and an inter prediction mode is performed after an image block to be processed is divided into basic prediction blocks according to the size of a specified basic prediction block, so that the encoding performance is limited.

Disclosure of Invention

The embodiment of the application provides an inter-frame prediction method and device, which can perform inter-frame prediction after adaptively determining the size of a basic prediction block, and improve the coding performance.

In a first aspect of the present application, a method of inter-frame prediction is provided, including: determining the size of a basic prediction block in the image block to be processed according to the size reference information, wherein the size is used for determining the position of the basic prediction block in the image block to be processed; then determining a first reference block and a second reference block of the basic prediction block according to the position of the basic prediction block; wherein a left boundary of a first reference block of the basic prediction block and a left boundary of the basic prediction unit are collinear, an upper boundary of a second reference block of the basic prediction block and an upper boundary of the basic prediction unit are collinear, the first reference block of the basic prediction block is adjacent to an upper boundary of the image block to be processed, and the second reference block of the basic prediction block is adjacent to the left boundary of the image block to be processed; and finally, performing weighted calculation on one or more of a motion vector corresponding to the first reference block of the basic prediction block, a motion vector corresponding to the second reference block of the basic prediction block and a motion vector corresponding to an original reference block with a preset position relation with the image block to be processed to acquire the motion vector corresponding to the basic prediction block.

By the inter-frame prediction method, the size of the basic prediction block is determined in a self-adaptive mode according to the size reference information, and the size of the more appropriate basic prediction block is determined according to the reasonable size reference information, so that the inter-frame prediction performance is higher during encoding.

With reference to the first aspect, in a possible implementation manner, the original reference block having a preset positional relationship with the image block to be processed may include: and the original reference block has a preset spatial domain position relation with the image block to be processed, and/or the original reference block has a preset time domain position relation with the image block to be processed. In this embodiment, the reliability of the generated motion vector can be improved by appropriately selecting a reference block for generating a motion vector corresponding to the basic prediction block.

With reference to the first aspect or the foregoing possible implementation manners, in another possible implementation manner, the original reference block having a preset spatial position relationship with the image block to be processed may include: the image blocks are located at the upper left corner of the image block to be processed and adjacent to the upper left corner point of the image block to be processed, the image blocks are located at the upper right corner of the image block to be processed and adjacent to the upper right corner point of the image block to be processed, and one or more of the image blocks are located at the lower left corner of the image block to be processed and adjacent to the lower left corner point of the image block to be processed. And the original reference block which has a preset spatial domain position relation with the image block to be processed is positioned outside the image block to be processed. In this embodiment, the spatial reference block used to generate the motion vector corresponding to the basic prediction block is appropriately selected, thereby improving the reliability of the generated motion vector.

With reference to the first aspect or any one of the foregoing possible implementations, in another possible implementation, an original reference block having a preset temporal position relationship with an image block to be processed may include: and an image block located at the lower right corner of the mapped image block and adjacent to the lower right corner of the mapped image block in the target reference frame. The original reference block which has a preset time domain position relation with the image block to be processed is located outside the mapping image block, the size of the mapping image block is equal to that of the image block to be processed, and the position of the mapping image block in the target reference frame is the same as that of the image block to be processed in the image frame of the image block to be processed. In this embodiment, the reliability of the generated motion vector is improved by appropriately selecting the temporal reference block used to generate the motion vector corresponding to the basic prediction block.

With reference to the first aspect or any one of the foregoing possible implementations, in another possible implementation, the index information and the reference frame list information of the target reference frame may be obtained by parsing the code stream. The code stream refers to a code stream transmitted between encoding/decoding ends. In this embodiment, by configuring the reference frame list and the index information, the target reference frame can be flexibly selected, so that the corresponding temporal reference block is more reliable.

With reference to the first aspect or any one of the foregoing feasible embodiments, in another feasible embodiment, the index information and the reference frame list information of the target reference frame are located in a code stream segment corresponding to a slice head of a slice in which the to-be-processed image block is located. The identification information of the target reference frame is stored in the strip head, and all time domain reference blocks of the image block in the strip share the same reference frame information, so that the coding code stream is saved, and the coding efficiency is improved.

With reference to the first aspect or any one of the foregoing possible embodiments, in another possible embodiment, performing weighted calculation on one or more of a motion vector corresponding to a first reference block of a basic prediction block, a motion vector corresponding to a second reference block of the basic prediction block, and a motion vector corresponding to an original reference block having a preset positional relationship with an image block to be processed to obtain a motion vector corresponding to the basic prediction block may specifically be implemented as: the motion vector corresponding to the basic prediction block is obtained according to the following formula:

P(x,y)＝(H×P_h(x,y)+W×P_v(x, y) + hxw)/(2 xhxw); wherein the content of the first and second substances,

P_h(x,y)＝(W-1-x)×L(-1,y)+(x+1)×R(W,y)；

P_v(x,y)＝(H-1-y)×A(x,-1)+(y+1)×B(x,H)；

R(W,y)＝((H-y-1)×AR+(y+1)×BR)/H；

B(x,H)＝((W-x-1)×BL+(x+1)×BR)/W；

AR is a motion vector corresponding to an image block located in the upper right corner of the to-be-processed image block and adjacent to the upper right corner point of the to-be-processed image block, BR is a motion vector corresponding to an image block located in the lower right corner of the to-be-processed image block and adjacent to the lower right corner point of the to-be-processed image block in the target reference frame, BL is a motion vector corresponding to an image block located in the lower left corner of the to-be-processed image block and adjacent to the lower left corner point of the to-be-processed image block, x is a ratio of a horizontal distance of the upper left corner point of the basic prediction block relative to the upper left corner point of the to-be-processed image block to a width of the basic prediction block, y is a ratio of a vertical distance of the upper left corner point of the basic prediction block relative to the upper left corner point of the to-be-processed image block to a, y) is the motion vector corresponding to the second reference block, a (x, -1) is the motion vector corresponding to the first reference block, and P (x, y) is the motion vector corresponding to the basic prediction block.

It should be noted that, the present embodiment provides various embodiments of performing weighted calculation on one or more of the motion vector corresponding to the first reference block, the motion vector corresponding to the second reference block, and the motion vector corresponding to the original reference block having the preset positional relationship with the image block to be processed, so as to obtain the motion vector corresponding to the basic prediction block, and the present invention is not limited to the content of the present embodiment.

With reference to the first aspect or any one of the above possible embodiments, in another possible embodiment, the size reference information may include a first identifier; the first flag is used to indicate a size of the basic prediction block. The inter-frame prediction method provided by the present application may further include: and receiving a code stream, analyzing the code stream to obtain a first identifier, and taking the size indicated by the first identifier as the size of the basic prediction block. The first identifier may be located in a code stream segment corresponding to any one of a sequence parameter set of a sequence in which the image block to be processed is located, an image parameter set of an image in which the image block to be processed is located, and a slice header of a slice in which the image block to be processed is located. The identification information of the size of the basic prediction block is added into the auxiliary information of the code stream, and the adaptive size is used during each image processing, so that the adaptability to the image content is improved, and the realization is simple.

With reference to the first aspect or any one of the above possible embodiments, in another possible embodiment, the size reference information may include a size of a planar mode prediction block in a previously reconstructed image of the current image block to be processed. The planar mode prediction block is an image block to be processed for inter prediction according to any feasible implementation manner of the first aspect, and the previously reconstructed image is an image that is located before the image where the current image block to be processed is located in the encoding order.

With reference to the first aspect or any one of the foregoing possible embodiments, in another possible embodiment, the determining, according to the size of the planar mode prediction block in a previously reconstructed image of an image in which the current image block to be processed is located, the size of the basic prediction block in the current image block to be processed may specifically be implemented as: calculating an average of the products of width and height of all plane mode prediction blocks in the previously reconstructed image; when the average value is smaller than a threshold value, the size of the basic prediction block in the current image block to be processed is a first size; when the average value is greater than or equal to the threshold value, the size of the basic prediction block in the current image block to be processed is a second size. Wherein the first size is smaller than the second size. In the embodiment, the prior information is used for determining the size of the basic prediction block of the current image block to be processed, and no additional identification information needs to be transmitted, so that the adaptability of the image is improved, and the coding rate is ensured not to be increased.

With reference to the first aspect or any one of the foregoing possible embodiments, in another possible embodiment, a previous reconstructed image of an image in which a current image block to be processed is located is a reconstructed image that has a same temporal layer identifier as the image in which the current image block to be processed is located, and a coding sequence of the reconstructed image is closest to the image in which the current image block to be processed is located. The reliability of the statistical information is improved by reasonably selecting the nearest reference frame in the same time domain layer to count the prior information.

With reference to the first aspect or any one of the foregoing possible embodiments, in another possible embodiment, a previous reconstructed image of an image in which the current image block to be processed is located is a reconstructed image that is closest to the image in which the current image block to be processed is located in the encoding order. The reliability of the statistical information is improved by reasonably selecting the nearest reference frame to count the prior information.

With reference to the first aspect or any one of the foregoing possible embodiments, in another possible embodiment, a previous reconstructed image of an image in which the current image block to be processed is located is a plurality of images, and correspondingly, calculating an average value of products of widths and heights of all the plane mode prediction blocks in the previous reconstructed image includes: an average of the products of width and height of all plane mode predicted blocks in a plurality of previously reconstructed images is calculated. The statistical information of multiple frames is accumulated to determine the size of the basic prediction block in the current image block, so that the reliability of statistics is improved.

With reference to the first aspect or any one of the above possible embodiments, in another possible embodiment, the threshold is a preset threshold.

With reference to the first aspect or any one of the foregoing possible embodiments, in another possible embodiment, when both POC of a reference frame of a picture in which an image block to be processed is located are smaller than POC of the picture in which the image block to be processed is located, the threshold may be a first threshold; the threshold may be a second threshold when the POC of the at least one reference frame of the picture in which the to-be-processed image block is located is greater than the POC of the picture in which the to-be-processed image block is located. Wherein the first threshold and the second threshold are different. Different thresholds can be set according to different coding scenes, and adaptability of the corresponding coding scenes is improved.

With reference to the first aspect or any one of the foregoing possible embodiments, in another possible embodiment, before determining the size of the basic prediction block in the image block to be processed according to the size reference information, the inter-prediction method provided in this application may further include: and determining the prediction direction of the image block to be processed.

With reference to the first aspect or any one of the foregoing possible implementations, in another possible implementation, determining a prediction direction of an image block to be processed may include: when the first direction prediction is effective and the second direction prediction is invalid, or the second direction prediction is effective and the first direction prediction is invalid, the prediction direction of the image block to be processed is unidirectional prediction; when the first-direction prediction is effective and the second-direction prediction is effective, the prediction direction of the image block to be processed is bidirectional prediction.

With reference to the first aspect or any one of the foregoing possible embodiments, in another possible embodiment, when at least one temporary image block in a neighboring area of the to-be-processed image block obtains a motion vector by using the first reference frame image list, the first direction prediction is valid; and when no temporary image block in the adjacent area of the image block to be processed adopts the first reference frame image list to obtain the motion vector, the first direction prediction is invalid. When at least one temporary image block in the adjacent area of the image block to be processed adopts a second reference frame image list to obtain a motion vector, the second-direction prediction is effective; and when no temporary image block in the adjacent area of the image block to be processed adopts the second reference frame image list to obtain the motion vector, the second-direction prediction is invalid.

With reference to the first aspect or any one of the foregoing possible embodiments, in another possible embodiment, the motion vector includes a first motion vector and/or a second motion vector, and when the first motion vectors of at least two temporary image blocks in adjacent areas of the image block to be processed, where the motion vectors are obtained by using the first reference frame image list, are different, the first vector prediction is valid; wherein the first motion vector corresponds to a first reference frame image list; when the first motion vectors of all temporary image blocks which adopt the first reference frame image list to obtain the motion vectors in the adjacent areas of the image blocks to be processed are the same, the first vector prediction is invalid; when the second motion vectors of at least two temporary image blocks which adopt the second reference frame image list to obtain the motion vectors in the adjacent areas of the image blocks to be processed are different, the second-direction prediction is effective; wherein the second motion vector corresponds to a second reference frame image list; and when the second motion vectors of all temporary image blocks which adopt the second reference frame image list to obtain the motion vectors in the adjacent areas of the image blocks to be processed are the same, the second-direction prediction is invalid.

With reference to the first aspect or any one of the foregoing possible embodiments, in another possible embodiment, when the temporary image block obtains the motion vector only by using the first reference frame image list, the first motion vector and the motion vector are the same; when the temporary image block obtains the motion vector using only the second reference frame image list, the second motion vector is the same as the motion vector.

With reference to the first aspect or any one of the above possible embodiments, in another possible embodiment, the temporary image block is an image block having a preset size.

With reference to the first aspect or any one of the foregoing possible embodiments, in another possible embodiment, the neighboring area of the image block to be processed may include: one or any combination of the left spatial domain region, the upper spatial domain region, the right temporal domain region, and the lower temporal domain region of the image block to be processed.

With reference to the first aspect or any one of the above possible implementations, in another possible implementation, the determining a size of a basic prediction block in an image block to be processed may include: when the side lengths of two adjacent sides of the basic prediction block are not equal, determining that the side length of the shorter side of the basic prediction block is 4 or 8; when the side lengths of two adjacent sides of the basic prediction block are equal, it is determined that the side length of the basic prediction block is 4 or 8. In this embodiment, the size of the basic prediction block is fixed, reducing complexity.

With reference to the first aspect or any one of the above possible implementations, in another possible implementation, the size reference information includes shape information of the image block to be processed; the shape information includes a width and a height. Specifically, determining the size of the basic prediction block according to the size reference information may include: when the width of an image block to be processed is greater than or equal to the height of the image block to be processed, the width of the basic prediction block is 8 pixels, and the height of the basic prediction block is 4 pixels; when the width of the image block to be processed is smaller than the height of the image block to be processed, the width of the basic prediction block is 4 pixels and the height is 8 pixels.

With reference to the first aspect or any one of the above possible embodiments, in another possible embodiment, the basic prediction block has a width of 8 pixels and a height of 8 pixels.

With reference to the first aspect or any one of the above possible implementations, in another possible implementation, the size reference information includes a prediction direction of the image block to be processed. Determining the size of the basic prediction block in the image block to be processed according to the size reference information may include: and determining the size of the basic prediction block according to the prediction direction of the image block to be processed.

With reference to the first aspect or any one of the foregoing possible embodiments, in another possible embodiment, determining a size of a basic prediction block according to a prediction direction of an image block to be processed includes: when the prediction direction of the image block to be processed is unidirectional prediction, the width of the basic prediction block is 4 pixels, and the height of the basic prediction block is 4 pixels; when the prediction direction of the image block to be processed is bidirectional prediction, the width of the basic prediction block is 8 pixels, and the height of the basic prediction block is 4 pixels, or the width of the basic prediction block is 4 pixels, and the height of the basic prediction block is 8 pixels.

With reference to the first aspect or any one of the foregoing possible embodiments, in another possible embodiment, determining the size of the basic prediction block according to the prediction direction of the image block to be processed may include: when the prediction direction of the image block to be processed is unidirectional prediction, the width of the basic prediction block is 4 pixels, and the height of the basic prediction block is 4 pixels; when the prediction direction of the image block to be processed is bidirectional prediction and the width of the image block to be processed is greater than or equal to the height of the image block to be processed, the width of the basic prediction block is 8 pixels, and the height of the basic prediction block is 4 pixels; when the prediction direction of the image block to be processed is bidirectional prediction and the width of the image block to be processed is smaller than the height of the image block to be processed, the width of the basic prediction block is 4 pixels and the height is 8 pixels.

With reference to the first aspect or any one of the foregoing possible embodiments, in another possible embodiment, determining a size of a basic prediction block according to a prediction direction of an image block to be processed includes: when the prediction direction of the image block to be processed is unidirectional prediction, the width of the basic prediction block is 4 pixels, and the height of the basic prediction block is 4 pixels; when the prediction direction of the image block to be processed is bidirectional prediction, the width of the basic prediction block is 8 pixels, and the height of the basic prediction block is 8 pixels.

With reference to the first aspect or any one of the foregoing possible embodiments, in another possible embodiment, determining a size of a basic prediction block according to a prediction direction of an image block to be processed includes: when the prediction direction of the image block to be processed is bidirectional prediction, the width of the basic prediction block is 8 pixels, and the height of the basic prediction block is 8 pixels; when the prediction direction of the image block to be processed is unidirectional prediction and the width of the image block to be processed is greater than or equal to the height of the image block to be processed, the width of the basic prediction block is 8 pixels, and the height of the basic prediction block is 4 pixels; when the prediction direction of the image block to be processed is unidirectional prediction and the width of the image block to be processed is smaller than the height of the image block to be processed, the width of the basic prediction block is 4 pixels and the height is 8 pixels.

It should be noted that, an example of a specific implementation of determining the size of the basic prediction block when the size reference information is different contents is provided in the foregoing content, it should be understood that the size reference information may include one or more contents, and when the size reference information includes a plurality of contents, the size of the basic prediction block may be determined according to actual requirements in combination with the plurality of contents included in the size reference information, and details of the determination process are not repeated here.

With reference to the first aspect or any one of the foregoing possible embodiments, in another possible embodiment, the inter prediction method provided in this application, after determining the size of the basic prediction block in the image block to be processed, may further include: dividing the image block to be processed into a plurality of basic prediction blocks according to the determined sizes of the basic prediction blocks; the position of each basic prediction block in the image block to be processed is determined in turn. It should be understood that this embodiment determines the coordinate position of each basic prediction block in the image block to be processed, and then performs inter prediction on each basic prediction block in the current image block to be processed.

With reference to the first aspect or any one of the foregoing possible embodiments, in another possible embodiment, before determining the size of the basic prediction block in the image block to be processed according to the size reference information, the inter-prediction method provided in this application may further include: and determining that the first reference block and the second reference block are positioned in the image boundary where the current image block to be processed is positioned. That is, it is determined that the current image block to be processed is not the boundary position of the image in which it is located. When the first reference block or the second reference block does not exist in the current image block to be processed, the inter-frame prediction method provided by the application is not adopted, and when the first reference block and the second reference block do not exist, the accuracy of the prediction method is reduced.

With reference to the first aspect or any one of the foregoing possible embodiments, in another possible embodiment, before determining the size of the basic prediction block in the image block to be processed according to the size reference information, the inter-prediction method provided in this application may further include: determining that the width of the image block to be processed is greater than or equal to 16 and the height of the image block to be processed is greater than or equal to 16; or determining that the width of the image block to be processed is greater than or equal to 16; alternatively, it is determined that the height of the image block to be processed is greater than or equal to 16. By the embodiment, when the image block to be processed is too small, the inter-frame prediction method provided by the application is not adopted, and the coding efficiency and the complexity are balanced.

In combination with the first aspect or any one of the above possible embodiments, in another possible embodiment, the inter prediction method provided by the present application may be used to encode an image block to be processed, or to decode an image block to be processed. It should be appreciated that embodiments of the present application relate to an inter-frame prediction method, which belongs to both a part of an encoding process and a part of a decoding process in a hybrid encoding architecture.

In a second aspect of the present application, an apparatus for inter-frame prediction is provided, including: a determining module, configured to determine, according to the size reference information, a size of the basic prediction block in the image block to be processed, where the size is used to determine a position of the basic prediction block in the image block to be processed; a positioning module, configured to determine a first reference block and a second reference block of the basic prediction block according to a position of the basic prediction block in the image block to be processed; wherein, the left boundary of the first reference block is collinear with the left boundary of the basic prediction unit, the upper boundary of the second reference block is collinear with the upper boundary of the basic prediction unit, the first reference block is adjacent to the upper boundary of the image block to be processed, and the second reference block is adjacent to the left boundary of the image block to be processed; and the calculating module is used for performing weighted calculation on one or more of the motion vector corresponding to the first reference block, the motion vector corresponding to the second reference block and the motion vector corresponding to the original reference block with a preset position relation with the image block to be processed to obtain the motion vector corresponding to the basic prediction block.

With reference to the second aspect, in a possible implementation manner, the original reference block having a preset positional relationship with the image block to be processed may include: and the original reference block has a preset spatial domain position relation with the image block to be processed, and/or the original reference block has a preset time domain position relation with the image block to be processed.

With reference to the second aspect or any one of the foregoing possible implementations, in another possible implementation, an original reference block having a preset spatial position relationship with an image block to be processed may include: the image blocks are located at the upper left corner of the image block to be processed and adjacent to the upper left corner point of the image block to be processed, the image blocks are located at the upper right corner of the image block to be processed and adjacent to the upper right corner point of the image block to be processed, and one or more of the image blocks are located at the lower left corner of the image block to be processed and adjacent to the lower left corner point of the image block to be processed. And the original reference block which has a preset spatial domain position relation with the image block to be processed is positioned outside the image block to be processed.

With reference to the second aspect or any one of the foregoing possible implementation manners, in another possible implementation manner, an original reference block having a preset temporal position relationship with an image block to be processed may include: and an image block located at the lower right corner of the mapped image block and adjacent to the lower right corner of the mapped image block in the target reference frame. The original reference block which has a preset time domain position relation with the image block to be processed is located outside the mapping image block, the size of the mapping image block is equal to that of the image block to be processed, and the position of the mapping image block in the target reference frame is the same as that of the image block to be processed in the image frame of the image block to be processed.

In combination with the second aspect or any one of the foregoing possible implementation manners, in another possible implementation manner, the index information and the reference frame list information of the target reference frame may be obtained by parsing the code stream. The code stream refers to a code stream transmitted between encoding/decoding ends.

With reference to the second aspect or any one of the foregoing possible implementation manners, in another possible implementation manner, the index information and the reference frame list information of the target reference frame are located in a code stream segment corresponding to a slice head of a slice in which the to-be-processed image block is located.

With reference to the second aspect or any one of the foregoing possible implementations, in another possible implementation, the calculating module is specifically configured to obtain a motion vector corresponding to the basic prediction block according to the following formula:

P_h(x,y)＝(W-1-x)×L(-1,y)+(x+1)×R(W,y)；

P_v(x,y)＝(H-1-y)×A(x,-1)+(y+1)×B(x,H)；

R(W,y)＝((H-y-1)×AR+(y+1)×BR)/H；

B(x,H)＝((W-x-1)×BL+(x+1)×BR)/W；

In combination with the second aspect or any one of the above possible implementations, in another possible implementation, the size reference information may include a first identifier; the first flag is used to indicate a size of the basic prediction block. The device further comprises: a receiving unit, configured to receive a code stream; and the analysis unit is used for analyzing the code stream received by the receiving unit to obtain the first identifier. The first identifier is located in a code stream segment corresponding to any one of a sequence parameter set of a sequence where the image blocks to be processed are located, an image parameter set of an image where the image blocks to be processed are located, and a strip header of a strip where the image blocks to be processed are located.

In combination with the second aspect or any one of the above possible implementations, in another possible implementation, the size reference information may include a size of a planar mode prediction block in a previously reconstructed image of the current image block to be processed. The planar mode prediction block is an image block to be processed for inter prediction according to any feasible implementation manner of the second aspect, and the previously reconstructed image is an image that is located before the image in which the current image block to be processed is located in the encoding order.

With reference to the second aspect or any one of the foregoing possible implementations, in another possible implementation, the determining module is specifically configured to: calculating an average of the products of width and height of all plane mode prediction blocks in the previously reconstructed image; when the average value is smaller than a threshold value, the size of the basic prediction block in the current image block to be processed is a first size; when the average value is greater than or equal to the threshold value, the size of the basic prediction block in the current image block to be processed is a second size. Wherein the first size is smaller than the second size.

With reference to the second aspect or any one of the foregoing possible implementation manners, in another possible implementation manner, a previous reconstructed image of an image in which the current image block to be processed is located is a reconstructed image that has a same temporal layer identifier as the image in which the current image block to be processed is located and has a coding sequence closest to the image in which the current image block to be processed is located.

With reference to the second aspect or any one of the foregoing possible implementation manners, in another possible implementation manner, a previous reconstructed image of an image in which the current image block to be processed is located is a reconstructed image that is closest to the image in which the current image block to be processed is located in the encoding order.

With reference to the second aspect or any one of the foregoing possible implementation manners, in another possible implementation manner, the previously reconstructed images of the image in which the current image block to be processed is located are multiple images, and correspondingly, the determining module is specifically configured to: an average of the products of width and height of all plane mode predicted blocks in a plurality of previously reconstructed images is calculated.

In combination with the second aspect or any one of the above possible implementations, in another possible implementation, the threshold is a preset threshold.

With reference to the second aspect or any one of the foregoing possible implementations, in another possible implementation, when both POC of a reference frame of a picture in which an image block to be processed is located are smaller than POC of the picture in which the image block to be processed is located, the threshold may be a first threshold; the threshold may be a second threshold when the POC of the at least one reference frame of the picture in which the to-be-processed image block is located is greater than the POC of the picture in which the to-be-processed image block is located. Wherein the first threshold and the second threshold are different.

In combination with the second aspect or any one of the above possible implementations, in another possible implementation, the determining module is further configured to: and determining the prediction direction of the image block to be processed.

With reference to the second aspect or any one of the foregoing possible implementations, in another possible implementation, the determining module is specifically configured to: when the first direction prediction is effective and the second direction prediction is invalid, or the second direction prediction is effective and the first direction prediction is invalid, the prediction direction of the image block to be processed is unidirectional prediction; when the first-direction prediction is effective and the second-direction prediction is effective, the prediction direction of the image block to be processed is bidirectional prediction.

With reference to the second aspect or any one of the foregoing possible implementation manners, in another possible implementation manner, when at least one temporary image block in a neighboring area of the to-be-processed image block obtains a motion vector by using the first reference frame image list, the first direction prediction is valid; and when no temporary image block in the adjacent area of the image block to be processed adopts the first reference frame image list to obtain the motion vector, the first direction prediction is invalid. When at least one temporary image block in the adjacent area of the image block to be processed adopts a second reference frame image list to obtain a motion vector, the second-direction prediction is effective; and when no temporary image block in the adjacent area of the image block to be processed adopts the second reference frame image list to obtain the motion vector, the second-direction prediction is invalid.

With reference to the second aspect or any one of the foregoing possible implementations, in another possible implementation, the motion vector includes a first motion vector and/or a second motion vector, and when the first motion vectors of at least two temporary image blocks in adjacent areas of the image block to be processed, which obtain the motion vector using the first reference frame image list, are different, the first vector prediction is valid; wherein the first motion vector corresponds to a first reference frame image list; when the first motion vectors of all temporary image blocks which adopt the first reference frame image list to obtain the motion vectors in the adjacent areas of the image blocks to be processed are the same, the first vector prediction is invalid; when the second motion vectors of at least two temporary image blocks which adopt the second reference frame image list to obtain the motion vectors in the adjacent areas of the image blocks to be processed are different, the second-direction prediction is effective; wherein the second motion vector corresponds to a second reference frame image list; and when the second motion vectors of all temporary image blocks which adopt the second reference frame image list to obtain the motion vectors in the adjacent areas of the image blocks to be processed are the same, the second-direction prediction is invalid.

With reference to the second aspect or any one of the foregoing possible implementations, in another possible implementation, when the temporary image block obtains the motion vector only by using the first reference frame image list, the first motion vector and the motion vector are the same; when the temporary image block obtains the motion vector using only the second reference frame image list, the second motion vector is the same as the motion vector.

In combination with the second aspect or any one of the above possible implementations, in another possible implementation, the temporary image block is an image block having a preset size.

With reference to the second aspect or any one of the foregoing possible implementation manners, in another possible implementation manner, the neighboring area of the image block to be processed may include: one or any combination of the left spatial domain region, the upper spatial domain region, the right temporal domain region, and the lower temporal domain region of the image block to be processed.

With reference to the second aspect or any one of the above possible embodiments, in another possible embodiment, determining the size of the basic prediction block in the image block to be processed may include: when the side lengths of two adjacent sides of the basic prediction block are not equal, determining that the side length of the shorter side of the basic prediction block is 4 or 8; when the side lengths of two adjacent sides of the basic prediction block are equal, it is determined that the side length of the basic prediction block is 4 or 8.

With reference to the second aspect or any one of the foregoing possible implementations, in another possible implementation, the size reference information includes shape information of the image block to be processed; the shape information includes a width and a height. The determination module is specifically configured to: when the width of an image block to be processed is greater than or equal to the height of the image block to be processed, the width of the basic prediction block is 8 pixels, and the height of the basic prediction block is 4 pixels; when the width of the image block to be processed is smaller than the height of the image block to be processed, the width of the basic prediction block is 4 pixels and the height is 8 pixels.

With reference to the second aspect or any one of the above possible implementations, in another possible implementation, the basic prediction block has a width of 8 pixels and a height of 8 pixels.

With reference to the second aspect or any one of the above possible implementations, in another possible implementation, the size reference information includes a prediction direction of the image block to be processed. The determination module is specifically configured to: and determining the size of the basic prediction block according to the prediction direction of the image block to be processed.

With reference to the second aspect or any one of the foregoing possible implementations, in another possible implementation, the determining module is specifically configured to: when the prediction direction of the image block to be processed is unidirectional prediction, the width of the basic prediction block is 4 pixels, and the height of the basic prediction block is 4 pixels; when the prediction direction of the image block to be processed is bidirectional prediction, the width of the basic prediction block is 8 pixels, and the height of the basic prediction block is 4 pixels, or the width of the basic prediction block is 4 pixels, and the height of the basic prediction block is 8 pixels.

With reference to the second aspect or any one of the foregoing possible implementations, in another possible implementation, the determining module is specifically configured to: when the prediction direction of the image block to be processed is unidirectional prediction, the width of the basic prediction block is 4 pixels, and the height of the basic prediction block is 4 pixels; when the prediction direction of the image block to be processed is bidirectional prediction and the width of the image block to be processed is greater than or equal to the height of the image block to be processed, the width of the basic prediction block is 8 pixels, and the height of the basic prediction block is 4 pixels; when the prediction direction of the image block to be processed is bidirectional prediction and the width of the image block to be processed is smaller than the height of the image block to be processed, the width of the basic prediction block is 4 pixels and the height is 8 pixels.

With reference to the second aspect or any one of the foregoing possible implementations, in another possible implementation, the determining module is specifically configured to: when the prediction direction of the image block to be processed is unidirectional prediction, the width of the basic prediction block is 4 pixels, and the height of the basic prediction block is 4 pixels; when the prediction direction of the image block to be processed is bidirectional prediction, the width of the basic prediction block is 8 pixels, and the height of the basic prediction block is 8 pixels.

With reference to the second aspect or any one of the foregoing possible implementations, in another possible implementation, the determining module is specifically configured to: when the prediction direction of the image block to be processed is bidirectional prediction, the width of the basic prediction block is 8 pixels, and the height of the basic prediction block is 8 pixels; when the prediction direction of the image block to be processed is unidirectional prediction and the width of the image block to be processed is greater than or equal to the height of the image block to be processed, the width of the basic prediction block is 8 pixels, and the height of the basic prediction block is 4 pixels; when the prediction direction of the image block to be processed is unidirectional prediction and the width of the image block to be processed is smaller than the height of the image block to be processed, the width of the basic prediction block is 4 pixels and the height is 8 pixels.

With reference to the second aspect or any one of the foregoing possible implementation manners, in another possible implementation manner, the inter-frame prediction apparatus provided by the present application may further include a dividing module, configured to: dividing an image block to be processed into a plurality of basic prediction blocks according to the sizes of the basic prediction blocks; the position of each basic prediction block in the image block to be processed is determined in turn.

With reference to the second aspect or any one of the foregoing possible implementation manners, in another possible implementation manner, the inter-frame prediction apparatus provided by the present application may further include a determining module, configured to: and determining that the first reference block and the second reference block are positioned in the image boundary of the image block to be processed.

With reference to the second aspect or any one of the foregoing possible implementation manners, in another possible implementation manner, the inter-frame prediction apparatus provided by the present application may further include a determining module, configured to: determining that the width of the image block to be processed is greater than or equal to 16 and the height of the image block to be processed is greater than or equal to 16; or determining that the width of the image block to be processed is greater than or equal to 16; alternatively, it is determined that the height of the image block to be processed is greater than or equal to 16.

In combination with the second aspect or any one of the foregoing possible implementations, in another possible implementation, the inter-prediction apparatus provided in this application is used to encode an image block to be processed, or decode an image block to be processed.

In a third aspect of the present application, there is provided an inter-frame prediction apparatus, including: a processor and a memory coupled to the processor; the processor is configured to perform the method for inter-frame prediction according to the first aspect or any possible implementation manner.

In a fourth aspect of the present application, a computer-readable storage medium is provided, in which instructions are stored, and when the instructions are executed on a computer, the instructions cause the computer to perform the method for inter-frame prediction according to the first aspect or any possible implementation manner.

In a fifth aspect of the present application, there is provided a computer program product comprising instructions that, when executed on a computer, cause the computer to perform the method for inter-frame prediction according to the first aspect or any possible implementation manner.

In a sixth aspect of the present application, there is provided a video image encoder, which includes the inter-prediction apparatus according to the second aspect or any feasible implementation manner.

In a seventh aspect of the present application, a video image decoder is provided, where the video image decoder includes the apparatus for inter-frame prediction according to the second aspect or any feasible implementation manner.

It should be understood that the second to seventh aspects of the present application are consistent with the technical solutions of the first aspect of the present application, and the beneficial effects obtained by the aspects and the corresponding implementable design manners are similar, and are not repeated.

Drawings

FIG. 1 is an exemplary block diagram of a video coding system in an embodiment of the present application;

FIG. 2 is an exemplary block diagram of a video encoder in an embodiment of the present application;

FIG. 3 is an exemplary block diagram of a video decoder in an embodiment of the present application;

FIG. 4 is a schematic block diagram of an inter prediction module in an embodiment of the present application;

FIG. 5 is a schematic diagram illustrating a position relationship between a to-be-processed image block and a reference block thereof according to an embodiment of the present application;

FIG. 6 is an exemplary flowchart of a method for inter-prediction in an embodiment of the present application;

FIG. 7 is a diagram illustrating a scenario of weighted calculation of motion vectors corresponding to basic prediction blocks according to an embodiment of the present application;

FIG. 8 is a diagram illustrating another scenario of weighted calculation of motion vectors corresponding to basic prediction blocks in an embodiment of the present application;

FIG. 9 is a diagram illustrating another scenario of weighted calculation of motion vectors corresponding to basic prediction blocks in an embodiment of the present application;

FIG. 10 is an exemplary block diagram of an apparatus for inter-prediction in an embodiment of the present application;

fig. 11 is an exemplary block diagram of a decoding device in an embodiment of the present application.

Detailed Description

The terms "first," "second," "third," and "fourth," etc. in the description and claims of this application and the above-described drawings are used for distinguishing between different objects and not for limiting a particular order.

In the embodiments of the present application, words such as "exemplary" or "for example" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

FIG. 1 is a block diagram of a video coding system 1 of one example described in an embodiment of the present application. As used herein, the term "video coder" generally refers to both video encoders and video decoders. In this application, the term "video coding" or "coding" may generally refer to video encoding or video decoding. The video encoder 100 and the video decoder 200 of the video coding system 1 are configured to predict motion information, such as motion vectors, of a currently coded image block or a sub-block thereof according to various method examples described in any one of a plurality of new inter prediction modes proposed in the present application, such that the predicted motion vectors are maximally close to the motion vectors obtained using a motion estimation method, thereby eliminating the need to transmit motion vector differences when encoding, and further improving the coding and decoding performance.

As shown in fig. 1, video coding system 1 includes a source device 10 and a destination device 20. Source device 10 generates encoded video data. Accordingly, source device 10 may be referred to as a video encoding device. Destination device 20 may decode the encoded video data generated by source device 10. Destination device 20 may therefore be referred to as a video decoding device. Various implementations of source device 10, destination device 20, or both may include one or more processors and memory coupled to the one or more processors. The memory can include, but is not limited to, RAM, ROM, EEPROM, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures that can be accessed by a computer, as described herein.

Source device 10 and destination device 20 may comprise a variety of devices, including desktop computers, mobile computing devices, notebook (e.g., laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called "smart" phones, televisions, cameras, display devices, digital media players, video game consoles, in-vehicle computers, or the like.

Destination device 20 may receive encoded video data from source device 10 over link 30. Link 30 may comprise one or more media or devices capable of moving encoded video data from source device 10 to destination device 20. In one example, link 30 may comprise one or more communication media that enable source device 10 to transmit encoded video data directly to destination device 20 in real-time. In this example, source device 10 may modulate the encoded video data according to a communication standard, such as a wireless communication protocol, and may transmit the modulated video data to destination device 20. The one or more communication media may include wireless and/or wired communication media such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (e.g., the internet). The one or more communication media may include a router, switch, base station, or other apparatus that facilitates communication from source device 10 to destination device 20.

In another example, encoded data may be output from output interface 140 to storage device 40. Similarly, encoded data may be accessed from storage device 40 through input interface 240. Storage 40 may comprise any of a variety of distributed or locally accessed data storage media such as a hard disk drive, blu-ray discs, Digital Versatile Discs (DVDs), compact disc read-only memories (CD-ROMs), flash memory, volatile or non-volatile memory, or any other suitable digital storage media for storing encoded video data.

In another example, storage device 40 may correspond to a file server or another intermediate storage device that may hold the encoded video generated by source device 10. Destination device 20 may access the stored video data from storage device 40 via streaming or download. The file server may be any type of server capable of storing encoded video data and transmitting the encoded video data to destination device 20. Example file servers include web servers (e.g., for websites), File Transfer Protocol (FTP) servers, Network Attached Storage (NAS) devices, or local disk drives. Destination device 20 may access the encoded video data over any standard data connection, including an internet connection. This may include a wIreless channel (e.g., a wIreless fidelity (Wi-Fi) connection), a wired connection (e.g., a Digital Subscriber Line (DSL), cable modem, etc.), or a combination of both suitable for accessing encoded video data stored on a file server. The transmission of the encoded video data from storage device 40 may be a streaming transmission, a download transmission, or a combination of both.

The methods of inter-frame prediction provided herein may be applied to video codecs to support a variety of multimedia applications, such as over-the-air television broadcasts, cable television transmissions, satellite television transmissions, streaming video transmissions (e.g., via the internet), encoding for video data stored on a data storage medium, decoding of video data stored on a data storage medium, or other applications. In some examples, video coding system 1 may be used to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.

The video coding system 1 illustrated in fig. 1 is merely an example, and the techniques of this application may be applied to video coding settings (e.g., video encoding or video decoding) that do not necessarily include any data communication between an encoding device and a decoding device. In other examples, the data is retrieved from local storage, streamed over a network, and so forth. A video encoding device may encode and store data to a memory, and/or a video decoding device may retrieve and decode data from a memory. In many examples, the encoding and decoding are performed by devices that do not communicate with each other, but merely encode data to and/or retrieve data from memory and decode data.

In the example of fig. 1, source device 10 includes video source 120, video encoder 100, and output interface 140. In some examples, output interface 140 may include a regulator/demodulator (modem) and/or a transmitter. Video source 120 may comprise a video capture device (e.g., a video camera), a video archive containing previously captured video data, a video feed interface to receive video data from a video content provider, and/or a computer graphics system for generating video data, or a combination of such sources of video data.

Video encoder 100 may encode video data from video source 120. In some examples, source device 10 transmits the encoded video data directly to destination device 20 via output interface 140. In other examples, encoded video data may also be stored onto storage device 40 for later access by destination device 20 for decoding and/or playback.

In the example of fig. 1, destination device 20 includes input interface 240, video decoder 200, and display device 220. In some examples, input interface 240 includes a receiver and/or a modem. Input interface 240 may receive encoded video data via link 30 and/or from storage device 40. Display device 220 may be integrated with destination device 20 or may be external to destination device 20. In general, display device 220 displays decoded video data. The display device 220 may include various display devices, such as a Liquid Crystal Display (LCD), a plasma display, an organic light-emitting diode (OLED) display, or other types of display devices.

Although not shown in fig. 1, in some aspects, video encoder 100 and video decoder 200 may each be integrated with an audio encoder and decoder, and may include appropriate multiplexer-demultiplexer units or other hardware and software to handle encoding of both audio and video in a common data stream or separate data streams. In some examples, the demultiplexer (MUX-DEMUX) unit may conform to the ITU h.223 multiplexer protocol, or other protocols such as User Datagram Protocol (UDP), if applicable.

Video encoder 100 and video decoder 200 may each be implemented as any of a variety of circuits such as: one or more microprocessors, Digital Signal Processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), discrete logic, hardware, or any combinations thereof. If the present application is implemented in part in software, a device may store instructions for the software in a suitable non-volatile computer-readable storage medium and may execute the instructions in hardware using one or more processors to implement the techniques of the present application. Any of the foregoing, including hardware, software, a combination of hardware and software, etc., may be considered one or more processors. Each of video encoder 100 and video decoder 200 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (codec) in a respective device.

This application may generally refer to video encoder 100 as "signaling" or "transmitting" certain information to another device, such as video decoder 200. The terms "signaling" or "transmitting" may generally refer to the communication of syntax elements and/or other data used to decode compressed video data. This transfer may occur in real time or near real time. Alternatively, such communication may occur over a period of time, such as may occur when, at the time of encoding, syntax elements are stored in the encoded codestream to a computer-readable storage medium, which the decoding device may then retrieve at any time after the syntax elements are stored to such medium.

The H.265(HEVC) standard was developed by JCT-VC. HEVC standardization is based on an evolution model of the video decoding device called HEVC test model (HM). The latest standard document for H.265 is available from http:// www.itu.int/REC/T-REC-H.265, the latest version of the standard document being H.265(12/16), which is incorporated herein by reference in its entirety. The HM assumes that the video decoding device has several additional capabilities with respect to existing algorithms of ITU-T H.264/AVC. For example, h.264 provides 9 intra-prediction encoding modes, while the HM may provide up to 35 intra-prediction encoding modes.

Jfet is dedicated to developing the h.266 standard. The process of h.266 normalization is based on an evolving model of the video decoding apparatus called the h.266 test model. The algorithm description of H.266 is available from http:// phenix. int-evry. fr/JVET, with the latest algorithm description contained in JFET-F1001-v 2, which is incorporated herein by reference in its entirety. Also, reference software for JEM test models is available from https:// jvet. hhi. fraunhofer. de/svn/svn _ HMJEMSOFORWare/incorporated herein by reference in its entirety.

In general, the working model of HM describes that a video frame or image may be divided into a sequence of treeblocks or Largest Coding Units (LCUs), also referred to as Coding Tree Units (CTUs), that contain both luma and chroma samples. Treeblocks have a similar purpose as macroblocks of the h.264 standard. A slice includes a number of consecutive treeblocks in decoding order. A video frame or image may be partitioned into one or more slices. Each treeblock may be split into Coding Units (CUs) according to a quadtree. For example, a treeblock that is the root node of a quadtree may be split into four child nodes, and each child node may in turn be a parent node and split into four other child nodes. The final non-fragmentable child node, which is a leaf node of the quadtree, comprises a decoding node, e.g., a decoded video block. Syntax data associated with the decoded codestream may define a maximum number of times the treeblock may be split, and may also define a minimum size of the decoding node.

An encoding unit includes a decoding node and a prediction block (PU) and a Transform Unit (TU) associated with the decoding node. The size of a CU corresponds to the size of the decoding node and must be square in shape. The size of a CU may range from 8 x 8 pixels up to a maximum treeblock size of 64 x 64 pixels or more. Each CU may contain one or more PUs and one or more TUs. For example, syntax data associated with a CU may describe a situation in which the CU is partitioned into one or more PUs. The partition mode may be different between cases where a CU is skipped or is directly mode encoded, intra prediction mode encoded, or inter prediction mode encoded. The PU may be partitioned into shapes other than square. For example, syntax data associated with a CU may also describe a situation in which the CU is partitioned into one or more TUs according to a quadtree. The TU may be square or non-square in shape.

The HEVC standard allows for transform according to TUs, which may be different for different CUs. A TU is typically sized based on the size of a PU within a given CU defined for a partitioned LCU, although this may not always be the case. The size of a TU is typically the same as or smaller than a PU. In some possible implementations, the residual samples corresponding to a CU may be subdivided into smaller units using a quadtree structure called a "residual qualtree" (RQT). The leaf nodes of the RQT may be referred to as TUs. The pixel difference values associated with the TUs may be transformed to produce transform coefficients, which may be quantized.

In general, a PU includes data related to a prediction process. For example, when the PU is intra-mode encoded, the PU may include data describing an intra-prediction mode of the PU. As another possible implementation, when the PU is inter-mode encoded, the PU may include data defining a motion vector for the PU. For example, the data defining the motion vector for the PU may describe a horizontal component of the motion vector, a vertical component of the motion vector, a resolution of the motion vector (e.g., one-quarter pixel precision or one-eighth pixel precision), a reference picture to which the motion vector points, and/or a reference picture list of the motion vector (e.g., list 0, list 1, or list C).

In general, TUs use a transform and quantization process. A given CU with one or more PUs may also contain one or more TUs. After prediction, video encoder 100 may calculate residual values corresponding to the PU. The residual values comprise pixel difference values that may be transformed into transform coefficients, quantized, and scanned using TUs to produce serialized transform coefficients for entropy decoding. The term "video block" is generally used herein to refer to a decoding node of a CU. In some particular applications, the present application may also use the term "video block" to refer to a treeblock that includes a decoding node as well as PUs and TUs, e.g., an LCU or CU.

A video sequence typically comprises a series of video frames or images. A group of pictures (GOP) illustratively comprises a series of one or more video pictures. The GOP may include syntax data in header information of the GOP, header information of one or more of the pictures, or elsewhere, the syntax data describing the number of pictures included in the GOP. Each slice of a picture may include slice syntax data that describes the coding mode of the respective picture. Video encoder 100 typically operates on video blocks within individual video stripes in order to encode the video data. The video block may correspond to a decoding node within the CU. Video blocks may have fixed or varying sizes and may differ in size according to a specified decoding standard.

As a possible implementation, the HM supports prediction of various PU sizes. Assuming that the size of a particular CU is 2N × 2N, the HM supports intra prediction of PU sizes of 2N × 2N or N × N, and inter prediction of symmetric PU sizes of 2N × 2N, 2N × N, N × 2N, or N × N. The HM also supports asymmetric partitioning for inter prediction for PU sizes of 2 nxnu, 2 nxnd, nlx 2N and nR x 2N. In asymmetric partitioning, one direction of a CU is not partitioned, while the other direction is partitioned into 25% and 75%. The portion of the CU corresponding to the 25% section is indicated by an indication of "n" followed by "Up", "Down", "Left", or "Right". Thus, for example, "2N × nU" refers to a horizontally split 2N × 2NCU, with 2N × 0.5NPU on top and 2N × 1.5NPU on the bottom.

In this application, "N × N" and "N by N" are used interchangeably to refer to the pixel size of a video block in both the vertical and horizontal dimensions, e.g., 16 × 16 pixels or 16 by 16 pixels. In general, a 16 × 16 block will have 16 pixels in the vertical direction (y ═ 16) and 16 pixels in the horizontal direction (x ═ 16). Likewise, an nxn block generally has N pixels in the vertical direction and N pixels in the horizontal direction, where N represents a non-negative integer value. The pixels in a block may be arranged in rows and columns. Furthermore, the block does not necessarily need to have the same number of pixels in the horizontal direction as in the vertical direction. For example, a block may comprise N × M pixels, where M is not necessarily equal to N.

After using intra-predictive or inter-predictive decoding of PUs of the CU, video encoder 100 may calculate residual data for TUs of the CU. A PU may comprise pixel data in a spatial domain (also referred to as a pixel domain), and a TU may comprise coefficients in a transform domain after applying a transform (e.g., a Discrete Cosine Transform (DCT), an integer transform, a wavelet transform, or a conceptually similar transform) to residual video data. The residual data may correspond to pixel differences between pixels of the unencoded picture and prediction values corresponding to the PUs. Video encoder 100 may form TUs that include residual data of a CU, and then transform the TUs to generate transform coefficients for the CU.

After any transform to generate transform coefficients, video encoder 100 may perform quantization of the transform coefficients. Quantization exemplarily refers to a process of quantizing coefficients to possibly reduce the amount of data used to represent the coefficients, thereby providing further compression. The quantization process may reduce the bit depth associated with some or all of the coefficients. For example, an n-bit value may be reduced to an m-bit value during quantization, where n is greater than m.

The JEM model further improves the coding structure of video images, and in particular, a block coding structure called "quadtree combined binary tree" (QTBT) is introduced. The QTBT structure abandons the concepts of CU, PU, TU and the like in HEVC, supports more flexible CU partition shapes, and one CU can be square or rectangular. One CTU first performs a quadtree division, and leaf nodes of the quadtree are further subjected to a binary tree division. Meanwhile, there are two partitioning modes in binary tree partitioning, symmetric horizontal partitioning and symmetric vertical partitioning. The leaf nodes of the binary tree are called CUs, and none of the JEM CUs can be further divided during prediction and transformation, i.e. all of the JEM CUs, PUs, TUs have the same block size. In JEM at the present stage, the maximum size of the CTU is 256 × 256 luminance pixels.

In some possible implementations, video encoder 100 may utilize a predefined scan order to scan the quantized transform coefficients to generate a serialized vector that may be entropy encoded. In other possible implementations, video encoder 100 may perform adaptive scanning. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 100 may entropy decode the one-dimensional vector according to context-adaptive variable length decoding (CAVLC), context-adaptive binary arithmetic decoding (CABAC), syntax-based context-adaptive binary arithmetic decoding (SBAC), probability interval entropy (PIPE) decoding, or other entropy decoding methods. Video encoder 100 may also entropy encode syntax elements associated with the encoded video data for use by video decoder 200 in decoding the video data.

To perform CABAC, video encoder 100 may assign a context within the context model to a symbol to be transmitted. A context may relate to whether adjacent values of a symbol are non-zero. To perform CAVLC, video encoder 100 may select a variable length code of a symbol to be transmitted. Codewords in variable-length decoding (VLC) may be constructed such that relatively shorter codes correspond to more likely symbols and longer codes correspond to less likely symbols. In this way, the use of VLC may achieve a code rate saving goal with respect to using equal length codewords for each symbol to be transmitted. The probability in CABAC may be determined based on the context assigned to the symbol.

In embodiments of the present application, a video encoder may perform inter prediction to reduce temporal redundancy between pictures. As described previously, a CU may have one or more prediction units, PUs, according to the specifications of different video compression codec standards. In other words, multiple PUs may belong to a CU, or the PUs and the CU are the same size. When the CU and PU sizes are the same, the partition mode of the CU is not partitioned, or is partitioned into one PU, and is expressed by using the PU collectively herein. When the video encoder performs inter prediction, the video encoder may signal the video decoder with motion information for the PU. For example, the motion information of the PU may include: reference picture index, motion vector and prediction direction identification. The motion vector may indicate a displacement between an image block (also referred to as a video block, a block of pixels, a set of pixels, etc.) of the PU and a reference block of the PU. The reference block of the PU may be a portion of a reference picture that is similar to the image block of the PU. The reference block may be located in a reference picture indicated by the reference picture index and the prediction direction identification.

To reduce the number of coding bits needed to represent the motion information of the PU, the video encoder may generate a list of candidate predictive Motion Vectors (MVs) for each of the PUs according to a merge prediction mode or advanced motion vector prediction mode process. Each candidate predictive motion vector in the list of candidate predictive motion vectors for the PU may indicate motion information. The motion information indicated by some candidate predicted motion vectors in the candidate predicted motion vector list may be based on motion information of other PUs. The present application may refer to a candidate predicted motion vector as an "original" candidate predicted motion vector if the candidate predicted motion vector indicates motion information that specifies one of a spatial candidate predicted motion vector position or a temporal candidate predicted motion vector position. For example, for merge mode, also referred to herein as merge prediction mode, there may be five original spatial candidate predicted motion vector positions and one original temporal candidate predicted motion vector position. In some examples, the video encoder may generate additional candidate predicted motion vectors by combining partial motion vectors from different original candidate predicted motion vectors, modifying the original candidate predicted motion vectors, or inserting only zero motion vectors as candidate predicted motion vectors. These additional candidate predicted motion vectors are not considered as original candidate predicted motion vectors and may be referred to as artificially generated candidate predicted motion vectors in this application.

The techniques of this application generally relate to techniques for generating a list of candidate predictive motion vectors at a video encoder and techniques for generating the same list of candidate predictive motion vectors at a video decoder. The video encoder and the video decoder may generate the same candidate prediction motion vector list by implementing the same techniques for constructing the candidate prediction motion vector list. For example, both the video encoder and the video decoder may construct a list with the same number of candidate predicted motion vectors (e.g., five candidate predicted motion vectors). Video encoders and decoders may first consider spatial candidate predictive motion vectors (e.g., neighboring blocks in the same picture), then temporal candidate predictive motion vectors (e.g., candidate predictive motion vectors in different pictures), and finally may consider artificially generated candidate predictive motion vectors until a desired number of candidate predictive motion vectors are added to the list. According to the techniques of this application, a pruning operation may be utilized during candidate predicted motion vector list construction for certain types of candidate predicted motion vectors in order to remove duplicates from the candidate predicted motion vector list, while for other types of candidate predicted motion vectors, pruning may not be used in order to reduce decoder complexity. For example, for a set of spatial candidate predicted motion vectors and for temporal candidate predicted motion vectors, a pruning operation may be performed to exclude candidate predicted motion vectors with duplicate motion information from the list of candidate predicted motion vectors. However, when the artificially generated candidate predicted motion vector is added to the list of candidate predicted motion vectors, the artificially generated candidate predicted motion vector may be added without performing a clipping operation on the artificially generated candidate predicted motion vector.

After generating the candidate predictive motion vector list for the PU of the CU, the video encoder may select a candidate predictive motion vector from the candidate predictive motion vector list and output a candidate predictive motion vector index in the codestream. The selected candidate predictive motion vector may be the candidate predictive motion vector having a motion vector that yields the predictor that most closely matches the target PU being decoded. The candidate predicted motion vector index may indicate a position in the candidate predicted motion vector list where the candidate predicted motion vector is selected. The video encoder may also generate a predictive image block for the PU based on the reference block indicated by the motion information of the PU. The motion information for the PU may be determined based on the motion information indicated by the selected candidate predictive motion vector. For example, in merge mode, the motion information of the PU may be the same as the motion information indicated by the selected candidate prediction motion vector. In the AMVP mode, the motion information of the PU may be determined based on the motion vector difference of the PU and the motion information indicated by the selected candidate prediction motion vector. The video encoder may generate one or more residual tiles for the CU based on the predictive tiles of the PUs of the CU and the original tiles for the CU. The video encoder may then encode the one or more residual image blocks and output the one or more residual image blocks in the code stream.

The codestream may include data identifying a selected candidate predictive motion vector in a candidate predictive motion vector list for the PU. The video decoder may determine the motion information for the PU based on the motion information indicated by the selected candidate predictive motion vector in the candidate predictive motion vector list for the PU. The video decoder may identify one or more reference blocks for the PU based on the motion information of the PU. After identifying the one or more reference blocks of the PU, the video decoder may generate a predictive image block for the PU based on the one or more reference blocks of the PU. The video decoder may reconstruct the tiles for the CU based on the predictive tiles for the PUs of the CU and the one or more residual tiles for the CU.

For ease of explanation, this application may describe locations or image blocks as having various spatial relationships with CUs or PUs. This description may be interpreted to mean that the locations or tiles have various spatial relationships with the tiles associated with the CU or PU. Furthermore, the present application may refer to a PU that is currently being decoded by the video decoder as a current PU, also referred to as a current pending image block. This application may refer to a CU that a video decoder is currently decoding as the current CU. The present application may refer to a picture that is currently being decoded by a video decoder as a current picture. It should be understood that the present application is applicable to the case where the PU and the CU have the same size, or the PU is the CU, and the PU is used uniformly for representation.

As briefly described above, video encoder 100 may use inter prediction to generate predictive image blocks and motion information for PUs of a CU. In many instances, the motion information for a given PU may be the same as or similar to the motion information of one or more nearby PUs (i.e., PUs whose tiles are spatially or temporally nearby to the tiles of the given PU). Because nearby PUs often have similar motion information, video encoder 100 may encode the motion information of a given PU with reference to the motion information of nearby PUs. Encoding motion information for a given PU with reference to motion information for nearby PUs may reduce the number of encoding bits required in the codestream to indicate the motion information for the given PU.

Video encoder 100 may encode the motion information of a given PU with reference to the motion information of nearby PUs in various ways. For example, video encoder 100 may indicate that the motion information of a given PU is the same as the motion information of nearby PUs. This application may use merge mode to refer to indicating that the motion information of a given PU is the same as or derivable from the motion information of nearby PUs. In another possible implementation, video encoder 100 may calculate a Motion Vector Difference (MVD) for a given PU. The MVD indicates the difference between the motion vector of a given PU and the motion vectors of nearby PUs. Video encoder 100 may include the motion vector for the MVD in the motion information for the given PU instead of the given PU. Fewer coding bits are required to represent the MVD in the codestream than to represent the motion vector for a given PU. The present application may use advanced motion vector prediction mode to refer to signaling motion information of a given PU to a decoding end by using an MVD and an index value identifying a candidate motion vector.

To signal motion information for a given PU at a decoding end using merge mode or AMVP mode, video encoder 100 may generate a list of candidate predictive motion vectors for the given PU. The candidate predictive motion vector list may include one or more candidate predictive motion vectors. Each of the candidate predictive motion vectors in the candidate predictive motion vector list for a given PU may specify motion information. The motion information indicated by each candidate predicted motion vector may include a motion vector, a reference picture index, and a prediction direction identification. The candidate predicted motion vectors in the candidate predicted motion vector list may comprise "original" candidate predicted motion vectors, where each indicates motion information for one of the specified candidate predicted motion vector positions within a PU that is different from the given PU.

After generating the list of candidate predictive motion vectors for the PU, video encoder 100 may select one of the candidate predictive motion vectors from the list of candidate predictive motion vectors for the PU. For example, the video encoder may compare each candidate predictive motion vector to the PU being decoded and may select a candidate predictive motion vector with the desired rate-distortion cost. Video encoder 100 may output the candidate prediction motion vector index for the PU. The candidate predicted motion vector index may identify a position of the selected candidate predicted motion vector in the candidate predicted motion vector list.

Furthermore, video encoder 100 may generate the predictive picture block for the PU based on the reference block indicated by the motion information of the PU. The motion information for the PU may be determined based on motion information indicated by a selected candidate predictive motion vector in a list of candidate predictive motion vectors for the PU. For example, in merge mode, the motion information of the PU may be the same as the motion information indicated by the selected candidate prediction motion vector. In AMVP mode, the motion information for the PU may be determined based on the motion vector difference for the PU and the motion information indicated by the selected candidate prediction motion vector. Video encoder 100 may process the predictive image blocks for the PU as described previously.

When video decoder 200 receives the codestream, video decoder 200 may generate a list of candidate predicted motion vectors for each of the PUs of the CU. The candidate prediction motion vector list generated by video decoder 200 for the PU may be the same as the candidate prediction motion vector list generated by video encoder 100 for the PU. The syntax element parsed from the codestream may indicate a location in the candidate predicted motion vector list for the PU where the candidate predicted motion vector is selected. After generating the list of candidate prediction motion vectors for the PU, video decoder 200 may generate a predictive image block for the PU based on one or more reference blocks indicated by the motion information of the PU. Video decoder 200 may determine the motion information for the PU based on the motion information indicated by the selected candidate predictive motion vector in the list of candidate predictive motion vectors for the PU. Video decoder 200 may reconstruct the tiles for the CU based on the predictive tiles for the PU and the residual tiles for the CU.

It should be understood that, in a possible implementation manner, at the decoding end, the construction of the candidate predicted motion vector list and the parsing of the selected candidate predicted motion vector from the code stream in the candidate predicted motion vector list are independent of each other, and may be performed in any order or in parallel.

In another possible implementation manner, at a decoding end, the position of a candidate predicted motion vector in a candidate predicted motion vector list is firstly analyzed and selected from a code stream, and the candidate predicted motion vector list is constructed according to the analyzed position. For example, when the selected candidate predicted motion vector obtained by analyzing the code stream is a candidate predicted motion vector with an index of 3 in the candidate predicted motion vector list, the candidate predicted motion vector with the index of 3 can be determined only by constructing the candidate predicted motion vector list from the index of 0 to the index of 3, so that the technical effects of reducing complexity and improving decoding efficiency can be achieved.

Fig. 2 is a block diagram of a video encoder 100 of one example described in an embodiment of the present application. The video encoder 100 is used to output the video to the post-processing entity 41. Post-processing entity 41 represents an example of a video entity, such as a media-aware network element (MANE) or a splicing/editing device, that may process the encoded video data from video encoder 100. In some cases, post-processing entity 41 may be an instance of a network entity. In some video encoding systems, post-processing entity 41 and video encoder 100 may be parts of separate devices, while in other cases, the functionality described with respect to post-processing entity 41 may be performed by the same device that includes video encoder 100. In some example, post-processing entity 41 is an example of storage 40 of FIG. 1.

In the example of fig. 2, the video encoder 100 includes a prediction processing unit 108, a filter unit 106, a Decoded Picture Buffer (DPB) 107, a summer 112, a transformer 101, a quantizer 102, and an entropy encoder 103. The prediction processing unit 108 includes an inter predictor 110 and an intra predictor 109. For image block reconstruction, the video encoder 100 further includes an inverse quantizer 104, an inverse transformer 105, and a summer 111. Filter unit 106 is intended to represent one or more loop filters, such as deblocking filters, Adaptive Loop Filters (ALFs), and Sample Adaptive Offset (SAO) filters. Although filter unit 106 is shown in fig. 2 as an in-loop filter, in other implementations, filter unit 106 may be implemented as a post-loop filter. In one example, the video encoder 100 may further include a video data memory, a partitioning unit (not shown).

The video data memory may store video data to be encoded by components of video encoder 100. The video data stored in the video data memory may be obtained from video source 120. DPB 107 may be a reference picture memory that stores reference video data used to encode video data by video encoder 100 in intra, inter coding modes. The video data memory and DPB 107 may be formed from any of a variety of memory devices, such as Dynamic Random Access Memory (DRAM) including Synchronous Dynamic Random Access Memory (SDRAM), Magnetoresistive RAM (MRAM), Resistive RAM (RRAM), or other types of memory devices. The video data memory and DPB 107 may be provided by the same memory device or separate memory devices. In various examples, the video data memory may be on-chip with other components of video encoder 100, or off-chip relative to those components.

As shown in fig. 2, video encoder 100 receives video data and stores the video data in a video data memory. The partitioning unit partitions the video data into image blocks and these image blocks may be further partitioned into smaller blocks, e.g. image block partitions based on a quadtree structure or a binary tree structure. This segmentation may also include segmentation into stripes (slices), slices (tiles), or other larger units. Video encoder 100 generally illustrates components that encode image blocks within a video slice to be encoded. The slice may be divided into a plurality of tiles (and possibly into a set of tiles called a slice). Prediction processing unit 108 may select one of a plurality of possible coding modes for the current image block, such as one of a plurality of intra coding modes or one of a plurality of inter coding modes. Prediction processing unit 108 may provide the resulting intra, inter coded block to summer 112 to generate a residual block and to summer 111 to reconstruct the encoded block used as the reference picture.

An intra predictor 109 within prediction processing unit 108 may perform intra-predictive encoding of the current block relative to one or more neighboring blocks in the same frame or slice as the current block to be encoded to remove spatial redundancy. Inter predictor 110 within prediction processing unit 108 may perform inter-predictive encoding of the current block relative to one or more prediction blocks in one or more reference pictures to remove temporal redundancy.

In particular, the inter predictor 110 may be used to determine an inter prediction mode for encoding a current image block. For example, the inter predictor 110 may use rate-distortion analysis to calculate rate-distortion values for various inter prediction modes in the set of candidate inter prediction modes and select the inter prediction mode having the best rate-distortion characteristics therefrom. Rate distortion analysis typically determines the amount of distortion (or error) between an encoded block and an original, unencoded block that was encoded to produce the encoded block, as well as the bit rate (that is, the number of bits) used to produce the encoded block. For example, the inter predictor 110 may determine an inter prediction mode with the smallest rate-distortion cost for encoding the current image block in the candidate inter prediction mode set as the inter prediction mode for inter predicting the current image block.

The inter predictor 110 is configured to predict motion information (e.g., a motion vector) of one or more sub-blocks in the current image block based on the determined inter prediction mode, and acquire or generate a prediction block of the current image block using the motion information (e.g., the motion vector) of the one or more sub-blocks in the current image block. The inter predictor 110 may locate the prediction block to which the motion vector points in one of the reference picture lists. The inter predictor 110 may also generate syntax elements associated with the image block and the video slice for use by the video decoder 200 in decoding the image block of the video slice. Or, in an example, the inter predictor 110 performs a motion compensation process using the motion information of each sub-block to generate a prediction block of each sub-block, so as to obtain a prediction block of the current image block; it should be understood that the inter predictor 110 herein performs motion estimation and motion compensation processes.

Specifically, after selecting the inter prediction mode for the current image block, the inter predictor 110 may provide information indicating the selected inter prediction mode for the current image block to the entropy encoder 103, so that the entropy encoder 103 encodes the information indicating the selected inter prediction mode.

The intra predictor 109 may perform intra prediction on the current image block. In particular, the intra predictor 109 may determine an intra prediction mode used to encode the current block. For example, the intra predictor 109 may calculate rate-distortion values for various intra prediction modes to be tested using rate-distortion analysis and select an intra prediction mode having the best rate-distortion characteristics from among the modes to be tested. In any case, after selecting the intra prediction mode for the image block, the intra predictor 109 may provide information indicating the selected intra prediction mode for the current image block to the entropy encoder 103 so that the entropy encoder 103 encodes the information indicating the selected intra prediction mode.

After prediction processing unit 108 generates a prediction block for the current image block via inter-prediction, intra-prediction, video encoder 100 forms a residual image block by subtracting the prediction block from the current image block to be encoded. Summer 112 represents one or more components that perform this subtraction operation. The residual video data in the residual block may be included in one or more (TUs) and applied to transformer 101. The transformer 101 transforms the residual video data into residual transform coefficients using a transform such as a Discrete Cosine Transform (DCT) or a conceptually similar transform. Transformer 101 may convert residual video data from a pixel value domain to a transform domain, e.g., the frequency domain.

The transformer 101 may send the resulting transform coefficients to the quantizer 102. Quantizer 102 quantizes the transform coefficients to further reduce the bit rate. In some examples, quantizer 102 may then perform a scan of a matrix that includes quantized transform coefficients. Alternatively, the entropy encoder 103 may perform a scan.

After quantization, the entropy encoder 103 entropy encodes the quantized transform coefficients. For example, the entropy encoder 103 may perform Context Adaptive Variable Length Coding (CAVLC), Context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context adaptive binary arithmetic coding (SBAC), Probability Interval Partition Entropy (PIPE) coding, or another entropy encoding method or technique. After entropy encoding by the entropy encoder 103, the encoded codestream may be transmitted to the video decoder 200, or archived for later transmission or retrieved by the video decoder 200. The entropy encoder 103 may also entropy encode syntax elements of the current image block to be encoded.

Inverse quantizer 104 and inverse transformer 105 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain, e.g., for later use as a reference block for a reference image. The summer 111 adds the reconstructed residual block to the prediction block produced by the inter predictor 110 or the intra predictor 109 to produce a reconstructed image block. The filter unit 106 may be adapted to reconstruct the image block to reduce distortions, such as block artifacts. This reconstructed image block is then stored as a reference block in the decoded image buffer 107, which may be used by the inter predictor 110 as a reference block to inter predict a block in a subsequent video frame or image.

It should be understood that other structural variations of the video encoder 100 may be used to encode the video stream. For example, for some image blocks or image frames, the video encoder 100 may quantize the residual signal directly without processing by the transformer 101 and correspondingly without processing by the inverse transformer 105; alternatively, for some image blocks or image frames, the video encoder 100 does not generate residual data and accordingly does not need to be processed by the transformer 101, quantizer 102, inverse quantizer 104, and inverse transformer 105; alternatively, the video encoder 100 may store the reconstructed picture block directly as a reference block without processing by the filter unit 106; alternatively, the quantizer 102 and the dequantizer 104 in the video encoder 100 may be combined together.

Fig. 3 is a block diagram of a video decoder 200 of one example described in an embodiment of the present application. In the example of fig. 3, the video decoder 200 includes an entropy decoder 203, a prediction processing unit 208, an inverse quantizer 204, an inverse transformer 205, a summer 211, a filter unit 206, and a DPB 207. The prediction processing unit 208 may include an inter predictor 210 and an intra predictor 209. In some examples, video decoder 200 may perform a decoding process that is substantially reciprocal to the encoding process described with respect to video encoder 100 from fig. 2.

In the decoding process, video decoder 200 receives an encoded video bitstream representing an image block and associated syntax elements of an encoded video slice from video encoder 100. Video decoder 200 may receive video data from network entity 42 and, optionally, may store the video data in a video data store (not shown). The video data memory may store video data, such as an encoded video bitstream, to be decoded by components of video decoder 200. The video data stored in the video data memory may be obtained, for example, from storage device 40, from a local video source such as a camera, via wired or wireless network communication of video data, or by accessing a physical data storage medium. The video data memory may serve as a decoded picture buffer (CPB) for storing encoded video data from the encoded video bitstream. Therefore, although the video data memory is not illustrated in fig. 3, the video data memory and the DPB 207 may be the same memory or may be separately provided memories. Video data memory and DPB 207 may be formed from any of a variety of memory devices, such as: dynamic Random Access Memory (DRAM) including synchronous DRAM (sdram), magnetoresistive ram (mram), resistive ram (rram), or other types of memory devices. In various examples, the video data memory may be integrated on-chip with other components of video decoder 200, or disposed off-chip with respect to those components.

Network entity 42 may be, for example, a server, a MANE, a video editor/splicer, or other such device for implementing one or more of the techniques described above. Network entity 42 may or may not include a video encoder, such as video encoder 100. Network entity 42 may implement portions of the techniques described in this application before network entity 42 sends the encoded video bitstream to video decoder 200. In some video decoding systems, network entity 42 and video decoder 200 may be part of separate devices, while in other cases, the functionality described with respect to network entity 42 may be performed by the same device that includes video decoder 200. In some cases, network entity 42 may be an example of storage 40 of fig. 1.

The entropy decoder 203 of the video decoder 200 entropy decodes the code stream to generate quantized coefficients and some syntax elements. The entropy decoder 203 forwards the syntax elements to the prediction processing unit 208. Video decoder 200 may receive syntax elements at the video slice level and/or the picture block level.

When a video slice is decoded as an intra-decoded (I) slice, intra predictor 209 of prediction processing unit 208 may generate a prediction block for an image block of the current video slice based on the signaled intra prediction mode and data from previously decoded blocks of the current frame or picture. When a video slice is decoded as an inter-decoded (i.e., B or P) slice, the inter predictor 210 of the prediction processing unit 208 may determine an inter prediction mode for decoding a current image block of the current video slice based on syntax elements received from the entropy decoder 203, decode the current image block (e.g., perform inter prediction) based on the determined inter prediction mode. Specifically, the inter predictor 210 may determine whether a current image block of the current video slice is predicted using a new inter prediction mode, and if the syntax element indicates that the current image block is predicted using the new inter prediction mode, predict motion information of the current image block or a sub-block of the current image block of the current video slice based on the new inter prediction mode (e.g., a new inter prediction mode designated by the syntax element or a default new inter prediction mode), so as to obtain or generate a prediction block of the current image block or the sub-block of the current image block using the motion information of the predicted current image block or the sub-block of the current image block through a motion compensation process. The motion information herein may include reference picture information and motion vectors, wherein the reference picture information may include, but is not limited to, uni/bi-directional prediction information, reference picture list numbers and reference picture indexes corresponding to the reference picture lists. For inter-prediction, a prediction block may be generated from one of the reference pictures within one of the reference picture lists. Video decoder 200 may construct reference picture lists, i.e., list 0 and list 1, based on the reference pictures stored in DPB 207. The reference frame index for the current picture may be included in one or more of reference frame list 0 and list 1. In some examples, it may be the particular syntax element that video encoder 100 signals indicating whether a new inter prediction mode is employed to decode the particular block, or it may be the particular syntax element that signals indicating whether a new inter prediction mode is employed and which new inter prediction mode is specifically employed to decode the particular block. It should be understood that the inter predictor 210 herein performs a motion compensation process.

The inverse quantizer 204 inversely quantizes, i.e., dequantizes, the quantized transform coefficients provided in the codestream and decoded by the entropy decoder 203. The inverse quantization process may include: the quantization parameter calculated by the video encoder 100 for each image block in the video slice is used to determine the degree of quantization that should be applied and likewise the degree of inverse quantization that should be applied. Inverse transformer 205 applies an inverse transform, such as an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process, to the transform coefficients in order to generate a block of residues in the pixel domain.

After the inter predictor 210 generates a prediction block for the current image block or a sub-block of the current image block, the video decoder 200 obtains a reconstructed block, i.e., a decoded image block, by summing the residual block from the inverse transformer 205 with the corresponding prediction block generated by the inter predictor 210. Summer 211 represents the component that performs this summation operation. A loop filter (in or after the decoding loop) may also be used to smooth pixel transitions or otherwise improve video quality, if desired. Filter unit 206 may represent one or more loop filters, such as deblocking filters, Adaptive Loop Filters (ALF), and Sample Adaptive Offset (SAO) filters. Although the filter unit 206 is shown in fig. 3 as an in-loop filter, in other implementations, the filter unit 206 may be implemented as a post-loop filter. In one example, the filter unit 206 is adapted to reconstruct the blocks to reduce the block distortion, and the result is output as a decoded video stream. Also, decoded image blocks in a given frame or picture may also be stored in a decoded picture buffer 207, with the DPB 207 storing reference pictures for subsequent motion compensation. DPB 207 may be part of a memory that may also store decoded video for later presentation on a display device (e.g., display device 220 of fig. 1), or may be separate from such memory.

It should be understood that other structural variations of the video decoder 200 may be used to decode the encoded video stream. For example, the video decoder 200 may generate an output video stream without processing by the filter unit 206; alternatively, for some image blocks or image frames, the entropy decoder 203 of the video decoder 200 does not decode quantized coefficients and accordingly does not need to be processed by the inverse quantizer 204 and the inverse transformer 205.

As noted previously, the techniques of this application illustratively relate to inter-frame decoding. It should be understood that the techniques of this application may be performed by any of the video decoders described in this application, including, for example, video encoder 100 and video decoder 200 as shown and described with respect to fig. 1-3. That is, in one possible implementation, the inter predictor 110 described with respect to fig. 2 may perform certain techniques described below when performing inter prediction during encoding of a block of video data. In another possible implementation, the inter predictor 210 described with respect to fig. 3 may perform certain techniques described below when performing inter prediction during decoding of a block of video data. Thus, reference to a generic "video encoder" or "video decoder" may include video encoder 100, video decoder 200, or another video encoding or encoding unit.

Fig. 4 is a schematic block diagram of an inter prediction module in an embodiment of the present application. The inter prediction module 121, for example, may include a motion estimation unit and a motion compensation unit. The relationship between PU and CU varies among different video compression codec standards. Inter prediction module 121 may partition the current CU into PUs according to a plurality of partition modes. For example, inter prediction module 121 may partition the current CU into PUs according to 2 nx 2N, 2 nx N, N x 2N, and nxn partition modes. In an implementation manner, according to the technical solution of the embodiment of the present application, the inter prediction module 121 may also partition the current CU into PUs according to the determined size of the basic prediction block; in this scenario, a CU is a pending image block and a PU is a basic prediction block. In other embodiments, the current CU is the current PU, and is not limited.

Inter prediction module 121 may perform motion estimation on each of the PUs, obtaining its motion vector. In one implementation, Motion Estimation may include Integer Motion Estimation (IME) followed by Fractional Motion Estimation (FME). When the inter prediction module 121 performs IME on the PU, the inter prediction module 121 may search one or more reference pictures for a reference block for the PU. After finding the reference block for the PU, inter prediction module 121 may generate a motion vector that indicates, with integer precision, a spatial displacement between the PU and the reference block for the PU. When the inter prediction module 121 performs FME on a PU, the inter prediction module 121 may refine a motion vector generated by performing IME on the PU. The motion vectors generated by performing FME on a PU may have sub-integer precision (e.g., 1/2 pixel precision, 1/4 pixel precision, etc.). After generating the motion vectors for the PU, inter prediction module 121 may use the motion vectors for the PU to generate a predictive image block for the PU. In a possible implementation manner, according to the technical solution of the embodiment of the present application, the inter prediction module 121 performs weighted calculation on one or more of a motion vector corresponding to a first reference block of a PU, a motion vector corresponding to a second reference block of the PU, and a motion vector corresponding to an original reference block having a preset position relationship with a CU to obtain a motion vector corresponding to the PU.

In some possible implementations where the inter prediction module 121 signals the motion information of the PU at the decoding end using AMVP mode, the inter prediction module 121 may generate a list of candidate prediction motion vectors for the PU. The list of candidate predicted motion vectors may include one or more original candidate predicted motion vectors and one or more additional candidate predicted motion vectors derived from the original candidate predicted motion vectors. After generating the candidate prediction motion vector list for the PU, inter prediction module 121 may select a candidate prediction motion vector from the candidate prediction motion vector list and generate a Motion Vector Difference (MVD) for the PU. The MVD for the PU may indicate a difference between the motion vector indicated by the selected candidate prediction motion vector and a motion vector generated for the PU using the IME and FME. In these possible implementations, the inter prediction module 121 may output a candidate prediction motion vector index that identifies the position of the selected candidate prediction motion vector in the candidate prediction motion vector list. The inter prediction module 121 may also output the MVD of the PU.

In addition to generating motion information for the PUs by performing IME and FME on the PUs, inter prediction module 121 may also perform a Merge (Merge) operation on each of the PUs. When inter prediction module 121 performs a merge operation on a PU, inter prediction module 121 may generate a list of candidate prediction motion vectors for the PU. The list of candidate predictive motion vectors for the PU may include one or more original candidate predictive motion vectors and one or more additional candidate predictive motion vectors derived from the original candidate predictive motion vectors. The original candidate predicted motion vectors in the list of candidate predicted motion vectors may include one or more spatial candidate predicted motion vectors and temporal candidate predicted motion vectors. The spatial candidate prediction motion vector may indicate motion information of other PUs in the current picture. The temporal candidate prediction motion vector may be based on motion information of a corresponding PU that is different from the current picture. The temporal candidate prediction motion vector may also be referred to as Temporal Motion Vector Prediction (TMVP).

After generating the candidate prediction motion vector list, the inter prediction module 121 may select one of the candidate prediction motion vectors from the candidate prediction motion vector list. Inter prediction module 121 may then generate a predictive image block for the PU based on the reference block indicated by the motion information of the PU. In merge mode, the motion information of the PU may be the same as the motion information indicated by the selected candidate prediction motion vector.

After generating the predictive image block for the PU based on the IME and FME and the predictive image block for the PU based on the merge operation, the inter prediction module 121 may select either the predictive image block generated by the FME operation or the predictive image block generated by the merge operation. In some possible implementations, the inter prediction module 121 may select the predictive image block for the PU based on rate-distortion cost analysis of the predictive image block generated by the FME operation and the predictive image block generated by the merge operation.

After inter prediction module 121 has selected the predictive tiles of PUs generated by partitioning the current CU according to each of the partition modes (in some implementations, after coding tree unit CTU is divided into CUs, it is not further divided into smaller PUs, at which point the PUs are equivalent to CUs), inter prediction module 121 may select the partition mode for the current CU. In some implementations, the inter-prediction module 121 may select the partitioning mode for the current CU based on a rate-distortion cost analysis of selected predictive tiles of the PU that are generated by partitioning the current CU according to each of the partitioning modes. Inter prediction module 121 may output the predictive image blocks associated with PUs belonging to the selected partition mode to residual generation module 102. Inter prediction module 121 may output, to an entropy encoding module, a syntax element indicating motion information for PUs belonging to the selected partition mode.

In the diagram of fig. 4, the inter-frame prediction module 121 includes IME modules 180A-180N (collectively referred to as "IME module 180"), FME modules 182A-182N (collectively referred to as "FME module 182"), merging modules 184A-184N (collectively referred to as "merging modules 184"), PU mode decision modules 186A-186N (collectively referred to as "PU mode decision modules 186"), and a CU mode decision module 188 (which may also include performing a mode decision process from the CTU to the CU).

The IME module 180, FME module 182, and merge module 184 may perform IME operations, FME operations, and merge operations on PUs of the current CU. The inter prediction module 121 is illustrated in the schematic diagram of fig. 4 as including a separate IME module 180, FME module 182, and merge module 184 for each PU of each partition mode of the CU. In other possible implementations, the inter prediction module 121 does not include a separate IME module 180, FME module 182, and merging module 184 for each PU of each partition mode of the CU.

As illustrated in the schematic diagram of fig. 4, IME module 180A, FME module 182A and merge module 184A may perform IME operations, FME operations, and merge operations on PUs generated by partitioning CUs according to a 2 nx 2N partitioning mode. The PU mode decision module 186A may select one of the predictive image blocks generated by the IME module 180A, FME module 182A and the merge module 184A.

IME module 180B, FME module 182B and merge module 184B may perform IME, FME, and merge operations on the left PU resulting from partitioning the CU according to the nx2N partitioning mode. The PU mode decision module 186B may select one of the predictive image blocks generated by the IME module 180B, FME module 182B and the merge module 184B.

IME module 180C, FME module 182C and merge module 184C may perform IME, FME, and merge operations on the right PU resulting from partitioning the CU according to the nx2N partitioning mode. The PU mode decision module 186C may select one of the predictive image blocks generated by the IME module 180C, FME module 182C and the merge module 184C.

IME module 180N, FME module 182N and merge module 184 may perform IME, FME, and merge operations on the bottom right PU resulting from partitioning the CU according to an nxn partitioning mode. The PU mode decision module 186N may select one of the predictive image blocks generated by the IME module 180N, FME module 182N and the merge module 184N.

The PU mode decision module 186 may select a predictive tile based on rate-distortion cost analysis of a plurality of possible predictive tiles and select the predictive tile that provides the best rate-distortion cost for a given decoding scenario. For example, for bandwidth-limited applications, the PU mode decision module 186 may bias towards selecting predictive tiles that increase the compression ratio, while for other applications, the PU mode decision module 186 may bias towards selecting predictive tiles that increase the reconstructed video quality. After PU mode decision module 186 selects the predictive tiles for the PUs of the current CU, CU mode decision module 188 selects the partition mode for the current CU and outputs the predictive tiles and motion information for the PUs belonging to the selected partition mode.

Fig. 5 is a schematic diagram illustrating an exemplary image block to be processed and a reference block thereof in an embodiment of the present application. As shown in FIG. 5, W and H are the width and height of the to-be-processed image block 500 and the co-located block of the to-be-processed image block in the specified reference image (referred to as the mapped image block) 500'. The content marked in each block in fig. 5 is the motion vector corresponding to the block. For example, P (x, y) labeled in fig. 5 is a motion vector corresponding to the basic prediction block 604 in the image block 500 to be processed. The reference block of the image block to be processed comprises: the image processing method comprises the steps of obtaining an image block to be processed, wherein the image block to be processed comprises an upper side airspace adjacent block and a left side airspace adjacent block of the image block to be processed, and a lower side airspace adjacent block and a right side airspace adjacent block of the image block to be mapped, wherein the image block to be mapped is an image block which has the same size and shape as the image block to be processed in a specified reference image, and the position of the image block to be mapped in the specified reference image is the same as the position of the image block to be processed in an image (generally. The lower and right spatially neighboring blocks of the mapped image block may also be referred to as temporal reference blocks. Each frame image may be divided into image blocks for encoding, referred to as to-be-processed image blocks, which may be further divided into smaller blocks, referred to as basic prediction blocks. For example, the image block to be processed and the mapped image block may be divided into a plurality of M × N sub-blocks, i.e. each sub-block has a size of M × N pixels, and it is not assumed that each reference block has a size of M × N pixels, i.e. the same size as the sub-blocks of the image block to be processed. "M × N" and "M by N" are used interchangeably to refer to the pixel sizes of image sub-blocks in the horizontal and vertical dimensions, i.e., M pixels in the horizontal direction and N pixels in the vertical direction, where M, N represents a non-negative integer value. Further, M and N are not necessarily the same.

For example, M may be equal to N and M, N are both 4, i.e. the size of the sub-block is 4 × 4, or M may not be equal to N, e.g. M is 8 and N is 4, i.e. the size of the sub-block is 8 × 4. In a possible embodiment, the subblock size of the image block to be processed and the size of the reference block may be 4 × 4, 8 × 8, 8 × 4 or 4 × 8 pixels, or the minimum size of a prediction block allowed by the standard, for example. In a possible embodiment, the measurement units of W and H are the width and height of the sub-block, respectively, i.e. W represents the ratio of the width of the image block to be processed to the width of the sub-block in the image block to be processed, and H represents the ratio of the height of the image block to be processed to the height of the sub-block in the image block to be processed. In addition, the image blocks to be processed described in the present application may be understood as, but not limited to: a Prediction Unit (PU) or a Coding Unit (CU) or a Transform Unit (TU), etc. According to the specifications of different video compression codec standards, a CU may include one or more prediction units, PUs, or the sizes of the PUs and the CU are the same. The tiles to be processed may have fixed or variable sizes and differ in size according to different video compression codec standards. Furthermore, an image block to be processed refers to an image block to be currently encoded or currently decoded, such as a prediction unit to be encoded or decoded. The image block to be processed may be a part or all of the image to be processed, which is not specifically limited in this application.

In one example, as shown in fig. 5, it may be determined sequentially along direction 1 whether each left-side spatial domain neighboring block of the image block to be processed is available, and it may be determined sequentially along direction 2 whether each upper-side spatial domain neighboring block of the image block to be processed is available. For example, whether the adjacent block adopts interframe coding is judged, and if the adjacent block exists and adopts interframe coding, the adjacent block is available; if a contiguous block does not exist or is intra-coded, the contiguous block is not available. In one possible implementation, if one neighboring block is intra-coded, the motion information of other neighboring reference blocks is copied as the motion information of the neighboring block. And detecting whether the lower spatial domain adjacent block and the right spatial domain adjacent block of the mapped image block are available according to a similar method, which is not described herein again.

It should be understood that there may be different granularities for the storage of motion information, such as in the h.264 and h.265 standards where the motion information is a 4 x 4 set of pixels as the basic unit for storing motion information. Illustratively, a 2 × 2, 8 × 8, 4 × 8, 8 × 4, or the like pixel set may be used as a basic unit for storing motion information. In this document, the basic unit storing the motion information is not simply referred to as a basic storage unit.

When the size of the reference block is consistent with the size of the basic storage unit, the motion information stored in the basic storage unit corresponding to the reference block can be directly acquired as the motion information corresponding to the reference block.

Or, when the size of the reference block is smaller than the size of the basic storage unit, the motion information stored in the basic storage unit corresponding to the reference block may be directly acquired as the motion information corresponding to the reference block.

Alternatively, when the size of the reference block is larger than the size of the basic unit storing the motion information, the motion information stored in the corresponding basic storage unit at the predetermined position of the reference block may be acquired. For example, the motion information stored in the corresponding basic storage unit at the top left corner of the reference block may be obtained, or the motion information stored in the corresponding basic storage unit at the center point of the reference block may be obtained as the motion information corresponding to the reference block.

In the embodiment of the present application, for convenience of description, a sub-block of an image to be processed is also referred to as a basic prediction block.

Fig. 6 exemplarily shows a method for inter-prediction provided by an embodiment of the present application, in which a schematic flow chart for obtaining a motion vector of each basic prediction block inside a to-be-processed image block according to a motion vector weighting corresponding to a reference block of the to-be-processed image block is obtained, and the method may include:

s601, according to the size reference information, determining the size of the basic prediction block in the image block to be processed, wherein the size is used for determining the position of the basic prediction block in the image block to be processed.

The image block to be processed, i.e. the image block currently processed by the encoder or the decoder, is hereinafter referred to as the image block to be processed or the image block currently to be processed.

In a possible embodiment, the size reference information may be the shape of the basic prediction block, and the size of the basic prediction block in the image block to be processed may be a fixed value predetermined by the codec end according to the size reference information and respectively solidified at the codec end. The corresponding relationship between the different shapes and the size values may be configured according to actual requirements, which is not specifically limited in the embodiment of the present application.

Illustratively, when the side lengths of two adjacent sides of the basic prediction block are not equal, that is, the basic prediction block is a non-square rectangle (non-square), the side length of the shorter side of the basic prediction block is determined to be 4 or 8; when the sides of two adjacent sides of the basic prediction block are equal, namely the basic prediction block is a square, the sides of the basic prediction block are determined to be 4 or 8. It should be understood that the above-mentioned side length of 4 or 8 is only an exemplary value, and may be other constants such as 16, 24, and the like.

In a possible implementation manner, the size reference information may be a first flag, where the first flag is used to indicate a size of the basic prediction block, and the size of the basic prediction block in the to-be-processed image block may be determined by parsing the code stream and obtaining the first flag from the code stream. Specifically, S601 may be implemented as: and receiving the code stream, and analyzing the first identifier from the code stream. The first identifier is located in a code stream segment corresponding to any one of a Sequence Parameter Set (SPS) of a sequence in which the image block to be processed is located, a Picture Parameter Set (PPS) of an image in which the image block to be processed is located, and a slice header (slice header) of a slice in which the image block to be processed is located.

In one possible implementation, the first identifier may be a syntax element, and in this case, S601 may specifically be implemented as: and analyzing corresponding syntax elements from the code stream, and further determining the size of the basic prediction block. The syntax element may be carried in a code stream portion corresponding to the SPS, a code stream portion corresponding to the PPS, or a code stream portion corresponding to the slice header.

It should be understood that when the first flag is parsed from the SPS to determine the size of the basic prediction block, the basic prediction block in the entire sequence takes the same size; when the first identifier is analyzed from the PPS to determine the size of the basic prediction block, the basic prediction block in the whole image frame adopts the same size; when the first flag is parsed from the slice header to determine the size of the basic prediction block, the basic prediction block in the entire slice has the same size.

It should be understood that, in this context, an image and an image frame are different concepts, and an image includes an image in the form of an entire frame (i.e., an image frame), and also includes an image in the form of a slice (slice), an image in the form of a slice (tile), or an image in the form of other sub-images, without limitation.

It is to be understood that, for a slice employing intra prediction, since the size of the basic prediction block does not need to be determined, the slice header of the slice employing intra prediction does not have the above-described first identification.

Specifically, the encoding end determines the size of the basic prediction block in an appropriate manner (for example, a rate distortion selection manner, an experimental empirical value manner, or other manners except for determining the size of the basic prediction block by using the first identifier in the embodiment of the present application), encodes the determined size of the basic prediction block as the first identifier into the code stream, and the decoding end analyzes the first identifier from the code stream to determine the size of the basic prediction block.

In a possible embodiment, the size reference information may include history information, and the size of the basic prediction block in the image block to be processed is determined by the history information and thus may be adaptively obtained at the codec end, respectively. The history information refers to information of an image block that has been coded and decoded before a current image block to be processed. For example, the history information may include the size of the plane mode prediction block in the previously reconstructed image. In particular, the size of the basic prediction block may be determined according to the size of the planar mode prediction block in the previously reconstructed image. The planar mode prediction block is an image block to be processed for inter prediction according to the inter prediction method provided by the embodiment of the application, and the previously reconstructed image is an image located before the image where the current image block to be processed is located in the encoding sequence.

It should be understood that when determining the basic prediction block size of the current image block to be processed, the planar mode prediction block in the previously reconstructed image is processed, and the planar mode prediction block is actually the image block that has been inter-predicted according to the inter-prediction method provided in the embodiment of the present application. The relevant paragraphs herein are explained accordingly and will not be repeated.

The image block to be processed, which is inter-predicted by the inter-prediction method described in the embodiment of the present application (for example, the method shown in fig. 6), is not referred to as a planar mode prediction block. The size of the basic prediction block in the image (hereinafter referred to simply as the current image) in which the current to-be-processed image block is located can be estimated from the size of the statistical plane mode prediction block in the previously coded image.

It should be understood that the encoding order of the image at the encoding end and the decoding order of the image at the decoding end are consistent, and therefore, the previously reconstructed image is an image whose encoding order is before the image block to be processed, and can also be described as an image whose decoding order is before the image block to be processed. The encoding order and the decoding order are understood in the above manner and will not be described in detail herein.

It should be understood that when the coding order of the reconstructed image a existing at the coding end is the same as the decoding order of the reconstructed image B existing at the decoding end, the image a and the image B are the same, and therefore, the same prior information (which may also be referred to as history information) may be obtained by performing analysis based on the same reconstructed image at the coding end and the decoding end, respectively, and the size of the basic prediction block is determined based on the prior information as the size reference information, and the same result may be obtained at the coding end and the decoding end, that is, an adaptive mechanism for determining the size of the basic prediction block is implemented.

Specifically, when the size of the basic prediction block in the image block to be processed is determined by the history information, the size of the basic prediction block of the current image may be determined as follows: calculating an average of the products of width and height of all plane mode prediction blocks in the previously reconstructed image; determining a size of the basic prediction block as a first size when the average value is less than a threshold; when the average is greater than or equal to the threshold, determining the size of the basic prediction block to be a second size. Wherein the first size is smaller than the second size.

It should be noted that specific values of the first size and the second size may be configured according to actual requirements, and this is not specifically limited in this embodiment of the application.

It is to be understood that the first dimension is smaller than the second dimension, and that the area of the first dimension is smaller than the area of the second dimension. Illustratively, the relationship between the first size and the second size may include, without limitation, that the first size is 4 (square side length) and the second size is 8 (square side length), or that the first size is 4 × 4 and the second size is 8 × 8, or that the first size is 4 × 4 and the second size is 4 × 8, or that the first size is 4 × 8 and the second size is 8 × 16.

It should be understood that, in general, the above threshold value is preset. The value of the threshold and the determination rule are not specifically limited, and can be configured according to actual requirements.

In a possible implementation manner, when the display order (POC) of the reference frames of the image in which the to-be-processed image block is located is smaller than the POC of the image in which the to-be-processed image block is located, the above threshold is a first threshold; when the POC of the at least one reference frame of the picture in which the to-be-processed image block is located is greater than the POC of the picture in which the to-be-processed image block is located, the threshold is the second threshold. Wherein the first threshold and the second threshold are different.

That is, when encoding in a low latency (low delay) manner, when the POC of the reference frames of the current picture are all smaller than the POC of the current picture, the threshold is set to the first value. Illustratively, the first value may be set to 75. When encoding in random access (random access) mode, when the POC of the at least one reference frame of the current picture is greater than the POC of the current picture, the threshold is set to a second value, which may be set to 27, for example. It is to be understood that the arrangement of the first and second values is not limiting.

In a possible implementation manner, the previous reconstructed image is a reconstructed image whose encoding sequence is closest to the image where the image block to be processed is located, that is, the previous reconstructed image is a reconstructed image whose decoding sequence is closest to the image where the current image block to be processed is located.

That is, the statistical information of all the plane mode prediction blocks in the previous encoded/decoded frame of the current image frame (illustratively, the average of the products of the widths and heights of all the plane mode prediction blocks) is taken as the size reference information, or the statistical information of all the plane mode prediction blocks in the previous slice of the current slice is taken as the size reference information. Correspondingly, in S601, the size of the basic prediction block in the current image frame may be determined according to the statistical information of all the plane mode prediction blocks in the previous encoding/decoding frame of the current image frame, or the size of the basic prediction block in the current slice may be determined according to the statistical information of all the plane mode prediction blocks in the previous slice of the current slice. As previously mentioned, the image may also include other forms of sub-images and, therefore, is not limited to image frames and slices.

It should be understood that in this embodiment, the statistical information is updated in units of image frames or slices, i.e., once per image frame or slice.

It should be understood that no update of statistical information is performed in image frames or slices that employ intra prediction.

In a feasible implementation manner, in the case that the previous reconstructed image is an image having the same temporal layer identifier as the image in which the current image block to be processed is located, the reconstructed image having the encoding sequence closest to the image in which the current image block to be processed is located is encoded, that is, in the case that the previous reconstructed image is an image having the same temporal layer identifier as the image in which the current image block to be processed is located, the reconstructed image having the decoding sequence closest to the image in which the current image block to be processed is located is decoded.

Namely, the image which is closest to the current image in coding distance is determined from the images which have the same temporal layer identification (temporal ID) as the image of the current image block to be processed. The specific manner may refer to the previous possible implementation manner, which is not described in detail.

In one possible embodiment, the previously reconstructed image may be a plurality of images, and correspondingly, calculating an average value of products of widths and heights of all plane mode prediction blocks in the previously reconstructed image may include: an average of the products of width and height of all plane mode predicted blocks in a plurality of previously reconstructed images is calculated.

It should be understood that the above two possible embodiments respectively determine the size of the basic prediction block of the current image block to be processed according to the statistics of a single previous reconstructed image, while in the present embodiment the statistics of a plurality of previous reconstructed images are accumulated to determine the size of the basic prediction block of the current image block to be processed. That is, in this embodiment, the statistical information is updated in units of a plurality of image frames or a plurality of bands, that is, once per a preset number of image frames or per a preset number of bands, or the statistical information may be accumulated without being updated all the time.

Specifically, calculating an average of the products of width and height of all plane mode prediction blocks in a plurality of previously reconstructed images may include: respectively counting the average value of the product of the width and the height of all plane mode prediction blocks in each image of a plurality of previously reconstructed images, and weighting the respectively counted average values to obtain a final average value for comparison with the threshold value in the embodiment; alternatively, calculating an average of the products of width and height of all plane mode prediction blocks in a plurality of previously reconstructed images may include: the average value used for comparison with the threshold value in the present embodiment is obtained by accumulating the product of the width and height of all the plane mode prediction blocks in a plurality of previously reconstructed images and dividing the result by the number of all the plane mode prediction blocks.

In a possible embodiment, the method further comprises determining that the statistical information is valid in calculating an average of the products of width and height of all the planar-mode prediction blocks in the previously reconstructed image. For example, if there is no plane mode prediction block in the previously reconstructed image, the above average cannot be calculated, and the statistical information is invalid. Correspondingly, the statistical information may not be updated, or the size of the basic prediction block of the current to-be-processed image block may be set to a preset value. For example, for a square basic prediction block, when the statistics are invalid, the size of the basic prediction block of the current to-be-processed image block may be set to 4 × 4.

It should be understood that, in the embodiment of determining the size of the basic prediction block using the history information for the first image using inter prediction, the size of the basic prediction block may also be set to a preset value.

In a possible embodiment, determining the size of the basic prediction block in the image block to be processed in S601 further includes determining the shape of the basic prediction block. For example, when the to-be-processed image block is a square, it may be determined that the basic prediction block is also a square, or the aspect ratio of the to-be-processed image block is consistent with the aspect ratio of the basic prediction block, or the width and the height of the to-be-processed image block are equally divided into several equal parts to obtain the width and the height of the basic prediction block, respectively, or the shape of the to-be-processed image block is not related to the shape of the basic prediction block. For example, the basic prediction block may be fixedly set to be square, or when the size of the to-be-processed image block is 32 × 16, the basic prediction block may be set to be 16 × 8 or 8 × 4, and the like, without limitation.

It should be understood that in one possible implementation, the determination of the basic prediction block shape is fixed at the codec end separately and remains consistent.

In a possible embodiment, before determining the size of the basic prediction block in the image block to be processed according to the size reference information in S601, the inter-prediction method provided by the present application may further include: and determining the prediction direction of the image block to be processed.

In a possible implementation, determining the prediction direction of the image block to be processed may include: when the first direction prediction is effective and the second direction prediction is invalid, or the second direction prediction is effective and the first direction prediction is invalid, the prediction direction of the image block to be processed is unidirectional prediction; when the first-direction prediction is effective and the second-direction prediction is effective, the prediction direction of the image block to be processed is bidirectional prediction.

The first direction prediction and the second direction prediction refer to predictions in two different directions, and are not specifically limited to the prediction directions. For example, the first directional prediction may be forward prediction and the second directional prediction may be backward prediction; alternatively, the first directional prediction may be backward prediction and the second directional prediction may be forward prediction.

In a possible implementation, the first direction prediction is valid when at least one temporary image block in a neighboring area of the image block to be processed adopts the first reference frame image list to obtain a motion vector; when no temporary image block in the adjacent area of the image block to be processed adopts the first reference frame image list to obtain a motion vector, the first direction prediction is invalid; when at least one temporary image block in the adjacent area of the image block to be processed adopts a second reference frame image list to obtain a motion vector, the second-direction prediction is effective; and when no temporary image block in the adjacent area of the image block to be processed adopts the second reference frame image list to obtain the motion vector, the second-direction prediction is invalid.

Wherein the first reference frame picture list is a reference frame picture list corresponding to a first directional prediction, and the second reference frame picture list is a reference frame picture list corresponding to a second directional prediction.

In one possible embodiment, the temporary image block is an image block having a preset size. The value of the preset size can be determined according to actual requirements, and this is not specifically limited in the embodiment of the present application.

In another possible implementation, the motion vector may include a first motion vector and/or a second motion vector, and the first direction prediction is valid when the first motion vectors of at least two temporary image blocks in the neighboring area of the image block to be processed, where the motion vectors are obtained by using the first reference frame image list, are different. Wherein the first motion vector corresponds to a first list of reference frame pictures. When the first motion vectors of all temporary image blocks which adopt the first reference frame image list to obtain the motion vectors in the adjacent areas of the image blocks to be processed are the same, the first vector prediction is invalid; and when the second motion vectors of at least two temporary image blocks which adopt the second reference frame image list to obtain the motion vectors in the adjacent areas of the image blocks to be processed are different, the second-direction prediction is effective. Wherein the second motion vector corresponds to a second reference frame image list; and when the second motion vectors of all temporary image blocks which adopt the second reference frame image list to obtain the motion vectors in the adjacent areas of the image blocks to be processed are the same, the second-direction prediction is invalid.

In one possible embodiment, when the temporary image block obtains the motion vector using only the first reference frame image list, the first motion vector and the motion vector are the same; when the temporary image block obtains the motion vector using only the second reference frame image list, the second motion vector is the same as the motion vector.

In a possible implementation, the neighboring area of the image block to be processed may include: one or any combination of the left spatial domain region, the upper spatial domain region, the right temporal domain region, and the lower temporal domain region of the image block to be processed.

In another possible implementation, the neighboring area of the image block to be processed may include: the method comprises the steps of processing an image block to be processed in a mode of one or any combination of a left side airspace region, an upper side airspace region, a left lower side airspace region, a right upper side airspace region, a right side time domain region and a lower side time domain region.

In a possible implementation, the size reference information may include a prediction direction of the image to be processed and/or shape information of the image block to be processed. The shape information may include, among other things, height and width. Accordingly, determining the size of the basic prediction block in the to-be-processed image block in S601 may include: and determining the size of the basic prediction block according to the prediction direction and/or the shape information of the image block to be processed.

In a possible embodiment, determining the size of the basic prediction block according to the prediction direction of the image block to be processed in S601 may include: when the prediction direction of the image block to be processed is unidirectional prediction, the width of the basic prediction block is 4 pixels, and the height of the basic prediction block is 4 pixels; when the prediction direction of the image block to be processed is bidirectional prediction, the width of the basic prediction block is 8 pixels, and the height of the basic prediction block is 4 pixels, or when the prediction direction of the image block to be processed is bidirectional prediction, the width of the basic prediction block is 4 pixels, and the height of the basic prediction block is 8 pixels.

In a possible embodiment, the determining the size of the basic prediction block according to the prediction direction of the image block to be processed in S601 may be implemented as: when the prediction direction of the image block to be processed is unidirectional prediction, the width of the basic prediction block is 4 pixels, and the height of the basic prediction block is 4 pixels; when the prediction direction of the image block to be processed is bidirectional prediction and the width of the image block to be processed is greater than or equal to the height of the image block to be processed, the width of the basic prediction block is 8 pixels, and the height of the basic prediction block is 4 pixels; when the prediction direction of the image block to be processed is bidirectional prediction and the width of the image block to be processed is smaller than the height of the image block to be processed, the width of the basic prediction block is 4 pixels and the height is 8 pixels.

In a possible embodiment, determining the size of the basic prediction block according to the prediction direction of the image block to be processed in S601 may include: when the prediction direction of the image block to be processed is unidirectional prediction, the width of the basic prediction block is 4 pixels, and the height of the basic prediction block is 4 pixels; when the prediction direction of the image block to be processed is bidirectional prediction, the width of the basic prediction block is 8 pixels, and the height of the basic prediction block is 8 pixels.

In a possible embodiment, the determining the size of the basic prediction block according to the prediction direction of the to-be-processed image block includes: when the prediction direction of the image block to be processed is bidirectional prediction, the width of the basic prediction block is 8 pixels, and the height of the basic prediction block is 8 pixels; when the prediction direction of the image block to be processed is unidirectional prediction and the width of the image block to be processed is greater than or equal to the height of the image block to be processed, the width of the basic prediction block is 8 pixels, and the height of the basic prediction block is 4 pixels; when the prediction direction of the image block to be processed is unidirectional prediction and the width of the image block to be processed is smaller than the height of the image block to be processed, the width of the basic prediction block is 4 pixels and the height is 8 pixels.

In a possible embodiment, determining the size of the basic prediction block according to the shape information of the image to be processed in S601 may include: when the width of the image block to be processed is greater than or equal to the height of the image block to be processed, the width of the basic prediction block is 8 pixels, and the height of the basic prediction block is 4 pixels; when the width of the image block to be processed is smaller than the height of the image block to be processed, the width of the basic prediction block is 4 pixels and the height is 8 pixels.

In one possible implementation, the basic prediction block may have a width of 8 pixels and a height of 8 pixels. In this embodiment, the content of the size reference information is not particularly limited.

It should be understood that the foregoing describes contents of several different pieces of size reference information, and provides a specific implementation of S601 corresponding to the contents of the different pieces of size reference information, and the contents of the size reference information may also be a combination of the foregoing contents, and the specific implementation of S601 is not described again in the case of the combination.

For example, the size reference information may be a first identifier and a prediction method of the to-be-processed image block, where the first identifier indicates a value range of a size of the basic prediction block, and then the size of the basic prediction block is determined in the value range according to the prediction method of the to-be-processed image block.

For example, the size reference information may be a first identifier and a shape of the to-be-processed image block, the first identifier indicates a value range of the size of the basic prediction block, and the size of the basic prediction block is determined in the value range according to the shape of the to-be-processed image block.

In a possible implementation manner, after step S601, the inter-frame prediction method provided by the present application further includes:

s602, dividing the image block to be processed into a plurality of basic prediction blocks according to the size of the basic prediction blocks; the position of each basic prediction block in the image block to be processed is determined in turn.

It should be understood that the size of each basic prediction block is the same, and after the size of the basic prediction block is determined, the position of each basic prediction block can be deduced sequentially by size in the image block to be processed.

It should be understood that in a possible embodiment, the positions of the image block to be processed and the basic prediction block are both in the form of coordinates, which step only requires determining the coordinates of each basic prediction block, or, alternatively, the image block to be processed and the basic prediction block are differentiated without a materialized partitioning step.

And S603, determining a first reference block and a second reference block of each basic prediction block according to the position of each basic prediction block.

Wherein a left boundary of the first reference block is collinear with a left boundary of the basic prediction unit, an upper boundary of the second reference block is collinear with an upper boundary of the basic prediction unit, the first reference block is adjacent to the upper boundary of the image block to be processed, and the second reference block is adjacent to the left boundary of the image block to be processed.

S604, performing weighted calculation on one or more of the motion vector corresponding to the first reference block, the motion vector corresponding to the second reference block and the motion vector corresponding to the original reference block with a preset position relation with the image block to be processed to obtain the motion vector corresponding to the basic prediction block.

Specifically, the operation of S604 is performed on each basic prediction block divided in S602, and a motion vector corresponding to each basic prediction block is obtained. The process of performing S604 for each basic prediction block is the same and is not described in detail.

In a possible implementation, the original reference block having a preset positional relationship with the image block to be processed may include: and the original reference block has a preset spatial domain position relation with the image block to be processed and/or has a preset time domain position relation with the image block to be processed. Of course, the original reference block having the preset positional relationship with the image block to be processed may also be defined according to actual requirements, which is not specifically limited in the embodiment of the present application.

In a possible embodiment, the original reference block having a preset spatial position relationship with the image block to be processed may include: the image blocks are located at the upper left corner of the image block to be processed and adjacent to the upper left corner point of the image block to be processed, the image blocks are located at the upper right corner of the image block to be processed and adjacent to the upper right corner point of the image block to be processed, and one or more of the image blocks are located at the lower left corner of the image block to be processed and adjacent to the lower left corner point of the image block to be processed. The original reference block having a preset spatial domain position relationship with the image block to be processed is located outside the image block to be processed, and is not simply referred to as a spatial domain reference block.

In a possible implementation, the original reference block having a preset temporal position relationship with the image block to be processed may include: and an image block located at the lower right corner of the mapped image block and adjacent to the lower right corner of the mapped image block in the target reference frame. The original reference block which has a preset time domain position relation with the image block to be processed is located outside the mapping image block, the size of the mapping image block is equal to that of the image block to be processed, the position of the mapping image block in the target reference frame is the same as that of the image block to be processed in the image frame of the image block to be processed, and the mapping image block and the image block to be processed are not simply called as a time domain reference block.

In one possible implementation, the index information and the reference frame list information of the target reference frame may be obtained by parsing the code stream. That is, the code stream includes index information of the target reference frame and reference frame list information, and the index information of the target reference frame is searched in the reference frame list information, so that the target reference frame can be determined.

In a possible implementation manner, the index information and the reference frame list information of the target reference frame may be located in a code stream segment corresponding to a slice head of a slice in which the to-be-processed image block is located.

Specific implementations of steps S603 and S604 are described in detail below by way of example. Fig. 7, 8, and 9 illustrate a scenario of weighted calculation of a motion vector corresponding to a basic prediction block, respectively.

In a possible implementation manner, in the scenario shown in fig. 7, the specific implementation of S603 and S604 may include the following steps:

s701, determining a first reference block 809 and a second reference block 802 according to the position of the basic prediction block 604 in the image block 600 to be processed.

The motion vector corresponding to the first reference block is A (x, -1), and the motion vector corresponding to the second reference block is L (-1, y).

S702A, performing weighted calculation based on the motion vector corresponding to the spatial reference block 805 in the upper right corner of the to-be-processed image block 600 and the motion vector corresponding to the temporal reference block 807 in the lower right corner of the to-be-processed image block 600, to obtain a motion vector corresponding to the first temporary block 806.

For example, the motion vector calculation formula for the first temporary block 806 is R (W, y) ═ ((H-y-1) × AR + (y +1) × BR)/H.

Wherein AR is a motion vector corresponding to an image block (spatial reference block 805) located in the upper right corner of the to-be-processed image block 600 and adjacent to the upper right corner point of the to-be-processed image block 600, BR is a motion vector corresponding to an image block located in the lower right corner of the mapped image block and adjacent to the lower right corner point of the mapped image block in the target reference frame, H is a ratio of a height of the to-be-processed image block 600 to a height of the basic prediction block 604, and y is a ratio of a vertical distance of the upper left corner point of the basic prediction block 604 relative to the upper left corner point of the to-be-processed image block 600 to the. And the index information and the reference frame list information of the target reference frame are obtained by analyzing the strip header, and the target reference frame is determined.

S702B, performing weighted calculation based on the motion vector corresponding to the spatial domain reference block 801 at the lower left corner of the image block 600 to be processed and the motion vector corresponding to the temporal domain reference block 807 at the lower right corner of the image block 600 to be processed, and obtaining a motion vector corresponding to the second temporary block 808.

For example, the motion vector calculation formula for the second temporary block 808 is B (x, H) ═ ((W-x-1) × BL + (x +1) × BR)/W.

Wherein BL is a motion vector corresponding to an image block (spatial reference block 801) located at the lower left corner of the to-be-processed image block 600 and adjacent to the lower left corner point of the to-be-processed image block, BR is a motion vector corresponding to an image block located at the lower right corner of the mapped image block and adjacent to the lower right corner point of the mapped image block in the target reference frame, W is a ratio of the width of the to-be-processed image block 600 to the width of the basic prediction block 604, and x is a ratio of the horizontal distance of the upper left corner point of the basic prediction block 604 relative to the upper left corner point of the to-be-processed image block 600 to the width of.

It should be understood that step S702A and step S702B do not limit the execution order relationship, and may be in sequence or simultaneously.

S703A, performing weighted computation based on the motion vector corresponding to the first temporary block 806 of the to-be-processed image block 600 and the motion vector corresponding to the second reference block 802 of the to-be-processed image block 600 to obtain a first temporary motion vector P corresponding to the basic prediction block 604_h(x,y)。

Illustratively, the first temporal motion vector P corresponding to the basic prediction block 604_hThe formula for (x, y) is P_h(x,y)＝(W-1-x)×L(-1,y)+(x+1)×R(W,y)。

S703B, performing weighted calculation based on the motion vector corresponding to the second temporary block 808 of the image block 600 to be processed and the motion vector corresponding to the first reference block 809 of the image block 600 to be processed, and obtaining a second temporary motion vector P corresponding to the basic prediction block 604_v(x,y)。

Illustratively, the second temporal motion vector P corresponding to the basic prediction block 604_vThe formula for (x, y) is P_v(x,y)＝(H-1-y)×A(x,-1)+(y+1)×B(x,H)。

It should be understood that step S703A and step S703B do not define the execution order relationship.

S704, based on the first temporary motion vector P of the image block 600 to be processed_h(x, y) and a second provisional motion vector P_v(x, y) is weighted to obtain the motion vector P (x, y) corresponding to the basic prediction block 604.

For example, the motion vector P (x, y) corresponding to the basic prediction block 604 is calculated as P (x, y) ═ H × P_h(x,y)+W×P_v(x,y)+H×W)/(2×H×W)。

It should be understood that, in a possible embodiment, the motion vector P (x, y) corresponding to the basic prediction block 604 can also be obtained by a single formula integrating the above steps.

Illustratively, the motion vector P (x, y) corresponding to the basic prediction block 604 is integrated by the single formula of the above steps as follows: p (x, y) ═ H (hx (((W-1-x) × L (-1, y) + (x +1) × ((H-y-1) × AR + (y +1) × BR)/H)

+W×((H-1-Y)×A(x,-1)+(y+1)×((W-x-1)×BL+(x+1)×BR)/W)

+H+W)/(2×H×W)

In another possible implementation manner, in the scenario shown in fig. 8, the specific implementation of S603 and S604 may include the following steps:

s801, determining a first reference block 809 and a second reference block 802 according to the position of the basic prediction block 604 in the image block 600 to be processed.

S802A, the motion vector corresponding to the spatial reference block 805 in the upper right corner of the image block 600 to be processed is used as the motion vector R (W, y) corresponding to the first temporary block 806 of the image block 600 to be processed.

S802B, taking the motion vector corresponding to the spatial reference block 801 at the lower left corner of the image block 600 to be processed as the motion vector B (x, H) corresponding to the second temporary block 808 of the image block 600 to be processed.

It should be understood that step S802A and step S802B do not define an execution order relationship. The execution can be performed sequentially or simultaneously.

S803A, performing weighted calculation based on the motion vector R (W, y) corresponding to the first temporary block 806 of the image block 600 to be processed and the motion vector corresponding to the second reference block 802 of the image block 600 to be processed, to obtain the first temporary motion vector P corresponding to the basic prediction block 604_h(x,y)。

Illustratively, the first temporal motion vector P corresponding to the basic prediction block 604_hThe formula for the calculation of (x, y) is: p_h(x,y)＝(W-1-x)×L(-1,y)+(x+1)×R(W,y)。

S803B, performing weighted calculation based on the motion vector B (x, H) corresponding to the second temporary block 808 of the image block 600 to be processed and the motion vector corresponding to the first reference block 809 of the image block 600 to be processed, to obtain a second temporary motion vector P corresponding to the basic prediction block 604_v(x,y)。

It should be understood that step S803A and step S803B do not define an execution order relationship. The execution can be performed sequentially or simultaneously.

S804, performing a weighted calculation based on the first temporary motion vector and the second temporary motion vector of the image block 600 to be processed, and obtaining a motion vector P (x, y) corresponding to the basic prediction block 604.

For example, the motion vector P (x, y) corresponding to the basic prediction block 604 may be calculated as P (x, y) ═ H × P_h(x,y)+W×P_v(x,y)+H×W)/(2×H×W)。

In another possible implementation manner, in the scenario shown in fig. 9, the specific implementation of S603 and S604 may include the following steps:

s901, a first reference block 809 and a second reference block 802 are determined according to the position of the basic prediction block 604 in the image block 600 to be processed.

S902, determining a first temporary block 806 and a second temporary block 808 according to the position of the basic prediction block 604 in the image block 600 to be processed.

The first temporary block is an image block located at the position of a block 806 in the target reference frame, the second temporary block is an image block located at the position of a block 808 in the target reference frame, and both the first temporary block and the second temporary block are time domain reference blocks.

S903A, performing weighted calculation based on the motion vector R (W, y) corresponding to the first temporary block 806 of the image block 600 to be processed and the motion vector corresponding to the second reference block 802 of the image block 600 to be processed, to obtain the first temporary motion vector P corresponding to the basic prediction block 604_h(x,y)。

S903B, performing weighted calculation based on the motion vector B (x, H) corresponding to the second temporary block 808 of the image block 600 to be processed and the motion vector corresponding to the first reference block 809 of the image block 600 to be processed, and obtaining a second temporary motion vector P corresponding to the basic prediction block 604_v(x,y)。

It should be understood that step S903A and step S903B do not define an execution order relationship.

S904, based on the first temporary motion vector P of the image block 600 to be processed_h(x, y) and a second provisional motion vector P_v(x, y) performing weighted calculation to obtain a baseThe motion vector P (x, y) corresponding to the prediction block 604.

s0101, a first reference block 809 and a second reference block 802 are determined according to the position of the basic prediction block 604 in the image block 600 to be processed.

S0102, performing motion compensation according to the motion information of any spatial reference block of the image block 600 to be processed, and determining the positions of the reference frame information and the motion compensation block.

The spatial reference block may be any available spatial adjacent block of the left spatial adjacent block or the upper spatial adjacent block shown in fig. 5. Illustratively, it may be the first available left-side spatial contiguous block detected along direction 1 in fig. 5, or it may be the first available upper-side spatial contiguous block detected along direction 2 in fig. 5; the first available spatial domain neighboring block obtained by detecting a plurality of preset spatial domain reference blocks of the image block 600 to be processed according to a preset order may also be detected, as shown in the sequence of L → a → AR → BL → AL in fig. 7; the spatial domain adjacent blocks may be selected according to a predetermined rule, without limitation.

S0103, a first temporary block 806 and a second temporary block 808 are determined according to the position of the basic prediction block 604 in the image block 600 to be processed.

The first temporary block is an image block located at the block 806 of the motion compensation block in the reference frame determined according to the reference frame information in step S0102, the second temporary block is an image block located at the block 808 of the motion compensation block in the reference frame determined according to the reference frame information in step S0102, and both the first temporary block and the second temporary block are temporal reference blocks.

S0104A、Performing weighting calculation based on the motion vector R (W, y) corresponding to the first temporary block 806 of the image block 600 to be processed and the motion vector corresponding to the second reference block 802 of the image 600 to be processed to obtain the first temporary motion vector P corresponding to the basic prediction block 604_h(x,y)。

S0104B, performing weighted calculation based on the motion vector B (x, H) corresponding to the second temporary block 808 of the image block 600 to be processed and the motion vector corresponding to the first reference block 809 of the image block 600 to be processed, to obtain a second temporary motion vector P corresponding to the basic prediction block 604_v(x,y)。

It should be understood that step S0104A and step S0104B do not define the execution order relationship.

S0105, a first temporary motion vector P based on the image block 600 to be processed_h(x, y) and a second provisional motion vector P_v(x, y) is weighted to obtain the motion vector P (x, y) corresponding to the basic prediction block 604.

It should be noted that the specific implementations of S603 and S604 described above in conjunction with fig. 7 to fig. 9 are only exemplary, and are not limiting to the specific implementations of S603 and S604.

The relationship between an image block and a basic storage unit storing motion information is mentioned above, and the motion information stored in the basic storage unit corresponding to the image block is not referred to as actual motion information of the image block, and the motion information includes a motion vector and index information of a reference frame pointed by the motion vector. It should be understood that the index information of the reference frames of the respective reference blocks used for weighting the motion vectors of the calculated basic prediction block cannot be guaranteed to be uniform. When the index information of the reference frame of each reference block is consistent, the motion information corresponding to the reference block is the actual motion information of the reference block. When the index information of the reference frames of the reference blocks is inconsistent, the actual motion vector of the reference block needs to be weighted according to the distance relationship of the reference frames indicated by the reference frame indexes, and the motion information corresponding to the reference block is the motion vector obtained by weighting the motion vector in the actual motion information of the reference block.

Specifically, the target reference picture index may be fixed to 0, 1 or other index values, for example, or may be a reference picture index with the highest frequency of use in the reference picture list, such as the actual motion vectors of all reference blocks or the reference picture index with the highest number of times of pointing of the weighted motion vectors.

Judging whether the index information of the reference frame of each reference block is the same as the index of the target image;

if the index information of the reference frame of a certain reference block is different from the target image index, the actual motion vector is scaled based on the ratio of the time distance between the image of the reference block and the reference frame image indicated by the actual motion information (reference frame index information) of the reference block to the time distance between the image of the reference block and the reference image indicated by the target reference image index to obtain the weighted motion vector.

In a possible implementation manner, after step S604, the method for inter-frame prediction provided by the embodiment of the present application may further include:

and S605, performing motion compensation on the image block to be processed based on the obtained basic prediction block motion vector.

In one possible embodiment, the method comprises the following steps: adjacent basic prediction blocks having the same motion information are first combined, and then motion compensation is performed with the combined image block as a unit for motion compensation.

Specifically, first, performing horizontal merging, that is, sequentially determining from left to right whether the motion information (for example, including a motion vector, a reference frame list, and reference frame index information) of the basic prediction block and the basic prediction blocks adjacent to the basic prediction block are the same for each row of the basic prediction block in the image block to be processed. When the motion information is the same, merging two adjacent basic prediction blocks, continuously judging whether the motion information of the next basic prediction block adjacent to the merged basic prediction block is the same as the motion information of the merged basic prediction block or not until the motion information of the adjacent basic prediction block is different from the motion information of the merged basic prediction block, stopping merging, and continuously performing the step of merging the adjacent basic prediction blocks with the same motion information by taking the basic prediction blocks with different motion information as a starting point until the basic prediction block row is ended.

And then, performing vertical combination, namely judging whether the lower edge of each block is completely coincided with the upper edge of another basic prediction block after horizontal combination or the non-combined basic prediction block. If the two basic prediction blocks are completely overlapped, merging two basic prediction blocks with the same motion information (or transversely merged basic prediction blocks) with overlapped edges, and continuing to perform the step of merging the adjacent basic prediction blocks with the same motion information and overlapped upper and lower edges on the longitudinally merged basic prediction block until no basic prediction block meeting the above condition exists in the image block to be processed.

And finally, performing motion compensation by taking the combined basic prediction block as a motion compensation unit.

In a possible embodiment, the merging mode for merging neighboring basic prediction blocks with the same motion information has a relation with the shape of the image block to be processed. Illustratively, when the width of the to-be-processed image block is greater than or equal to the height of the to-be-processed image block, the basic prediction block is merged only in the above-described manner of lateral merging. When the width of the image block to be processed is smaller than the height of the image block to be processed, each column of basic prediction blocks in the image block to be processed sequentially judges whether the motion information (including motion vectors, reference frame lists and reference frame index information, for example) of the basic prediction block and the basic prediction block adjacent to the basic prediction block are the same from top to bottom. When the motion information is the same, merging two adjacent basic prediction blocks, continuously judging whether the motion information of the next basic prediction block adjacent to the merged basic prediction block is the same as the motion information of the merged basic prediction block or not until the motion information of the adjacent basic prediction block is the same as the motion information of the merged basic prediction block, stopping merging, and continuously performing the step of merging the adjacent basic prediction blocks with the same motion information by taking the basic prediction blocks with different motion information as a starting point until the basic prediction block column is ended.

In a possible implementation manner, before step S601, as shown in fig. 6, the inter prediction method provided in the embodiment of the present application may further include:

and S606, determining that the first reference block and the second reference block are positioned in the image boundary where the image block to be processed is positioned.

That is, when the upper boundary line of the image block to be processed and the upper boundary line of the image in which the image block to be processed is located coincide, the first reference block does not exist, and the scheme in the embodiment of the present application is not applicable. When the left boundary of the image block to be processed and the left boundary of the image in which the image block to be processed is located coincide, the second reference block does not exist, and the scheme in the embodiment of the present application is also not applicable.

and S607, determining that the shape of the image block to be processed meets the preset condition.

In S607, when it is determined that the shape of the image block to be processed satisfies the preset condition, S601 is performed, otherwise, it is not performed.

For example, the preset conditions may include that the width of the image block to be processed is greater than or equal to 16 and the height of the image block to be processed is greater than or equal to 16; or determining that the width of the image block to be processed is greater than or equal to 16; alternatively, it is determined that the height of the image block to be processed is greater than or equal to 16.

That is, when the width of the image block to be processed is less than 16 or the height is less than 16, the scheme in the embodiment of the present application is not applicable, or, when the width of the image block to be processed is less than 16 and the height is less than 16, the scheme in the embodiment of the present application is not applicable.

It should be understood that, for example, 16 is used as the threshold, other values such as 8, 24, and 32 may also be used, and the thresholds corresponding to the width and the height may also be unequal, which is not limited.

It should be understood that step S606 and step S607 may be performed in cooperation. For example, in one possible implementation, the inter prediction scheme in the embodiment of the present application cannot be used when the to-be-processed image block is located at the left boundary, or the upper boundary, or both the width and the height of the to-be-processed image block are less than 16, and in another possible implementation, the inter prediction scheme in the embodiment of the present application cannot be used when the to-be-processed image block is located at the left boundary, or the upper boundary, or both the width and the height of the to-be-processed image block are less than 16.

Although particular aspects of the present application have been described with respect to video encoder 100 and video decoder 200, it should be understood that the techniques of the present application may be applied with many other video encoding and/or encoding units, processors, processing units, hardware-based encoding units such as encoders/decoders (CODECs), and the like. Moreover, it should be understood that the steps shown and described with respect to fig. 6 are provided as only possible implementations. That is, the steps shown in the possible implementation of fig. 6 need not necessarily be performed in the order shown in fig. 6, and fewer, additional, or alternative steps may be performed.

In the case that any two motion information of a plurality of available neighboring blocks of a current image block are not the same, predicting motion information of one or more sub-blocks in the current image block based on a plane planar mode, comprising: and in the case that the motion vector corresponding to the first reference picture list in the motion information of the one available neighboring block is not the same as the motion vector corresponding to the first reference picture list in the motion information of the other available neighboring block, and/or the motion vector corresponding to the second reference picture list in the motion information of the one available neighboring block is not the same as the motion vector corresponding to the second reference picture list in the motion information of the other available neighboring block, predicting the motion vector corresponding to the first reference picture list in the motion information of one or more sub-blocks in the current image block based on the planar mode (i.e., the first reference picture list is valid), and/or the motion vector corresponding to the second reference picture list (i.e., the second reference picture list is valid).

Under the condition that the first reference picture list and the second reference picture list are simultaneously effective, the current block is subjected to bidirectional prediction;

in case that only one of the first reference picture list and the second reference picture list is valid, the current block is uni-directionally predicted.

The plurality of available neighboring blocks may be: all available left-side temporally neighboring blocks of the current image block and all available upper-side temporally neighboring blocks of the current image block, or all available right-side temporally neighboring blocks of the current image block and all available lower-side temporally neighboring blocks of the current image block, or all available left-side spatially neighboring blocks of the current image block, all available upper-side temporally neighboring blocks of the current image block, all available right-side temporally neighboring blocks of the current image block, and all available lower-side temporally neighboring blocks of the current image block.

Fig. 10 is a schematic block diagram of an apparatus 1000 for inter-prediction in the embodiment of the present application. Specifically, the inter-frame prediction apparatus 100 may include: a determination module 1001, a positioning module 1002, and a calculation module 1003.

Wherein, the determining module 1001 is configured to determine, according to the size reference information, a size of the basic prediction block in the image block to be processed, where the size is used to determine a position of the basic prediction block in the image block to be processed.

A positioning module 1002 for determining a first reference block and a second reference block of the basic prediction block according to the position of the basic prediction block determined by the determining module 1001. Wherein a left boundary of a first reference block and a left boundary of the basic prediction unit are collinear, an upper boundary of a second reference block and an upper boundary of the basic prediction unit are collinear, the first reference block is adjacent to the upper boundary of the image block to be processed, and the second reference block is adjacent to the left boundary of the image block to be processed;

a calculating module 1003, configured to perform weighted calculation on one or more of the motion vector corresponding to the first reference block, the motion vector corresponding to the second reference block, and the motion vector corresponding to the original reference block having a preset positional relationship with the to-be-processed image block, so as to obtain a motion vector corresponding to the basic prediction block.

Wherein the apparatus 1000 for supporting the inter prediction by the determining module 1001 performs S601 and the like in the above embodiments, and/or other processes for the techniques described herein. The apparatus 1000 for supporting this inter prediction by the positioning module 1002 performs S603 and the like in the above embodiments, and/or other processes for the techniques described herein. The means 1000 for supporting the inter prediction by the calculation module 1003 performs S604 and S605, etc. in the above embodiments, and/or other processes for the techniques described herein.

Further, as shown in fig. 10, the apparatus 1000 for inter-prediction may further include a dividing unit 1004, and the apparatus 1000 for supporting inter-prediction may perform S602 and the like in the above-described embodiments, and/or other processes for the techniques described herein.

Further, as shown in fig. 10, the apparatus 1000 for inter prediction may further include a determining unit 1005, and the apparatus 1000 for supporting inter prediction performs S606 and S607 and the like in the above embodiments, and/or other processes for the techniques described herein.

All relevant contents of each step related to the above method embodiment may be referred to the functional description of the corresponding functional module, and are not described herein again.

Fig. 11 is a schematic block diagram of an implementation manner of an apparatus 1100 for inter-frame prediction according to an embodiment of the present application. Inter-prediction apparatus 1100 may include, among other things, a processor 1110, a memory 1130, and a bus system 1150. Wherein the processor is connected with the memory through the bus system, the memory is used for storing instructions, and the processor is used for executing the instructions stored by the memory. The memory of the encoding device stores program code, and the processor may call the program code stored in the memory to perform various video encoding or decoding methods described herein, particularly video encoding or decoding methods in various new inter prediction modes, and methods of predicting motion information in various new inter prediction modes. To avoid repetition, it is not described in detail here.

The memory 1130 may include a Read Only Memory (ROM) device or a Random Access Memory (RAM) device. Any other suitable type of memory device may also be used for memory 1130. Memory 1130 may include code and data 1131 that are accessed by processor 1110 using bus 1150. The memory 1130 may further include an operating system 1133 and application programs 1135, the application programs 1135 including at least one program that allows the processor 1110 to perform the video encoding or decoding methods described herein, and in particular the inter-prediction methods described herein. For example, the applications 1135 may include applications 1 through N, which further include video encoding or decoding applications (simply video coding applications) that perform the video encoding or decoding methods described herein.

The bus system 1150 may include a power bus, a control bus, a status signal bus, and the like, in addition to a data bus. For clarity of illustration, however, the various buses are designated in the figure as the bus system 1150.

Optionally, the apparatus 1100 for inter-prediction may also include one or more output devices, such as a display 1170. In one example, the display 1170 may be a touch sensitive display that incorporates a display with touch sensitive elements operable to sense touch input. A display 1170 may be connected to the processor 1110 via the bus 1150.

All relevant contents of each scene related to the method embodiment may be referred to the functional description of the corresponding functional module, and are not described herein again.

The inter-frame prediction apparatus 1000 and the inter-frame prediction apparatus 1100 may both perform the inter-frame prediction method shown in fig. 6, and the inter-frame prediction apparatus 1000 and the inter-frame prediction apparatus 1100 may be specifically a video coding and decoding apparatus or other devices with video coding and decoding functions. The inter-prediction apparatus 1000 and the inter-prediction apparatus 1100 may be used for image prediction in a coding process.

An embodiment of the present application provides a decoding apparatus, which includes the inter-frame prediction device described in any of the above embodiments. The decoding device may be a video decoder or a video encoder.

The present application further provides a terminal, including: one or more processors, memory, a communication interface. The memory, communication interface, and one or more processors; the memory is used for storing computer program code comprising instructions which, when executed by the one or more processors, cause the terminal to perform the method of inter-prediction of embodiments of the present application.

The terminal can be a video display device, a smart phone, a portable computer and other devices which can process video or play video.

Another embodiment of the present application also provides a computer-readable storage medium including one or more program codes, the one or more programs including instructions, which when executed by a processor in a terminal, cause the terminal to perform the method of inter-prediction as shown in fig. 6.

In another embodiment of the present application, there is also provided a computer program product comprising computer executable instructions stored in a computer readable storage medium; the computer-executable instructions may be read by the at least one processor of the terminal from a computer-readable storage medium, and execution of the computer-executable instructions by the at least one processor causes the terminal to implement a method of performing inter prediction as shown in fig. 6.

In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware or any combination thereof. When implemented using a software program, may take the form of a computer program product, either entirely or partially. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part.

The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Moreover, it is to be understood that certain actions or events of any of the methods described herein can be performed in a different sequence, added, combined, or left out together (e.g., not all described actions or events are necessary for the practice of the methods), depending on the possible implementations. Further, in certain possible implementations, acts or events may be performed concurrently, e.g., via multi-threaded processing, interrupt processing, or multiple processors, rather than sequentially. Additionally, although specific aspects of the disclosure are described as being performed by a single module or unit for purposes of clarity, it should be understood that the techniques of this disclosure may be performed by a combination of units or modules associated with a video decoder.

In one or more possible implementations, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media, corresponding to tangible media such as data storage media, or communication media, including any medium that facilitates transfer of a computer program from one place to another, such as according to a communication protocol.

In this manner, the computer-readable medium illustratively may correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described herein. The computer program product may include a computer-readable medium.

Such computer-readable storage media may include, as a possible implementation and not limitation, RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that may be used to store desired code in the form of instructions or data structures and that may be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium.

It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are instead directed to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, as used herein, the term "processor" may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Likewise, the techniques may be fully implemented in one or more circuits or logic elements.

The techniques of this application may be implemented in a wide variety of devices or apparatuses, including wireless handsets, Integrated Circuits (ICs), or a collection of ICs (e.g., a chipset). Various components, modules, or units are described herein to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, as described previously, the various units may be combined in a codec hardware unit or provided by an interoperative hardware unit (including one or more processors as described previously) in conjunction with a collection of suitable software and/or firmware.

The above description is only an exemplary embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of inter-prediction, comprising:

determining the size of a basic prediction block in an image block to be processed according to size reference information, wherein the size is used for determining the position of the basic prediction block in the image block to be processed;

determining a first reference block and a second reference block of the basic prediction block according to the position; wherein a left boundary of the first reference block and a left boundary of the basic prediction block are collinear, an upper boundary of the second reference block and an upper boundary of the basic prediction block are collinear, the first reference block is adjacent to an upper boundary of the image block to be processed, and the second reference block is adjacent to a left boundary of the image block to be processed;

and performing weighted calculation on one or more of the motion vector corresponding to the first reference block, the motion vector corresponding to the second reference block and the motion vector corresponding to the original reference block having a preset position relation with the image block to be processed so as to obtain the motion vector corresponding to the basic prediction block.

2. The method according to claim 1, wherein the original reference block having a predetermined positional relationship with the image block to be processed comprises: and the original reference block has a preset spatial domain position relation with the image block to be processed, and/or the original reference block has a preset time domain position relation with the image block to be processed.

3. The method according to claim 2, wherein the original reference block having a preset spatial position relationship with the image block to be processed comprises: one or more of an image block which is positioned at the upper left corner of the image block to be processed and is adjacent to the upper left corner point of the image block to be processed, an image block which is positioned at the upper right corner of the image block to be processed and is adjacent to the upper right corner point of the image block to be processed, and an image block which is positioned at the lower left corner of the image block to be processed and is adjacent to the lower left corner point of the image block to be processed; and the original reference block which has a preset spatial domain position relation with the image block to be processed is positioned outside the image block to be processed.

4. The method according to claim 2 or 3, wherein the original reference block having a preset temporal position relationship with the image block to be processed comprises: an image block which is positioned at the lower right corner of a mapping image block in a target reference frame and is adjacent to the lower right corner of the mapping image block; the original reference block which has a preset time domain position relation with the to-be-processed image block is located outside the mapping image block, the size of the mapping image block is equal to that of the to-be-processed image block, and the position of the mapping image block in the target reference frame is the same as that of the to-be-processed image block in the image frame of the to-be-processed image block.

5. The method of claim 4, wherein the index information and the reference frame list information of the target reference frame are obtained by parsing the code stream.

6. The method according to claim 5, wherein the index information and reference frame list information of the target reference frame are located in a code stream segment corresponding to a slice head of a slice in which the to-be-processed image block is located.

7. The method according to any one of claims 4 to 6, wherein the performing weighted calculation on one or more of the motion vector corresponding to the first reference block, the motion vector corresponding to the second reference block, and the motion vector corresponding to the original reference block having a preset positional relationship with the image block to be processed to obtain the motion vector corresponding to the basic prediction block includes:

the motion vector corresponding to the basic prediction block is obtained according to the following formula:

P(x,y)＝(H×P_h(x,y)+W×P_v(x,y)+H×W)/(2×H×W)

wherein the content of the first and second substances,

P_h(x,y)＝(W-1-x)×L(-1,y)+(x+1)×R(W,y)；

P_v(x,y)＝(H-1-y)×A(x,-1)+(y+1)×B(x,H)；

R(W,y)＝((H-y-1)×AR+(y+1)×BR)/H；

B(x,H)＝((W-x-1)×BL+(x+1)×BR)/W；

the AR is a motion vector corresponding to the image block which is positioned at the upper right corner of the image block to be processed and adjacent to the upper right corner point of the image block to be processed, the BR is a motion vector corresponding to the image block which is positioned at the lower right corner of a mapped image block and adjacent to the lower right corner point of the mapped image block in a target reference frame, the BL is a motion vector corresponding to the image block which is positioned at the lower left corner of the image block to be processed and adjacent to the lower left corner point of the image block to be processed, the x is a ratio of a horizontal distance of the upper left corner point of the basic prediction block relative to the upper left corner point of the image block to be processed to a width of the basic prediction block, the y is a ratio of a vertical distance of the upper left corner point of the basic prediction block relative to the upper left corner point of the image block to be processed to a height of the basic prediction block, and the H is a, the W is a ratio of the width of the to-be-processed image block to the width of the basic prediction block, the L (-1, y) is a motion vector corresponding to the second reference block, the a (x, -1) is a motion vector corresponding to the first reference block, and the P (x, y) is a motion vector corresponding to the basic prediction block.

8. The method according to any one of claims 1 to 7, wherein the size reference information includes a first identifier; the first flag is used to indicate a size of the basic prediction block;

the method further comprises the following steps: receiving a code stream, and analyzing the code stream to obtain the first identifier; the first identifier is located in a code stream segment corresponding to any one of a sequence parameter set of a sequence where the image blocks to be processed are located, an image parameter set of an image where the image blocks to be processed are located, and a strip header of a strip where the image blocks to be processed are located.

9. The method according to any of claims 1 to 8, further comprising, before said determining the size of the basic prediction block in the to-be-processed image block according to the size reference information:

and determining the prediction direction of the image block to be processed.

10. The method according to claim 9, wherein said determining the prediction direction of the image block to be processed comprises:

when a first direction prediction is effective and a second direction prediction is invalid, or the second direction prediction is effective and the first direction prediction is invalid, the prediction direction of the image block to be processed is a unidirectional prediction;

when the first-direction prediction is effective and the second-direction prediction is effective, the prediction direction of the image block to be processed is bidirectional prediction.

11. The method of claim 10,

when at least one temporary image block in the adjacent area of the image block to be processed adopts a first reference frame image list to obtain a motion vector, the first vector prediction is effective;

when the temporary image blocks in the adjacent areas of the image blocks to be processed do not adopt the first reference frame image list to obtain motion vectors, the first direction prediction is invalid;

when at least one temporary image block in the adjacent area of the image block to be processed adopts a second reference frame image list to obtain a motion vector, the second direction prediction is effective;

and when the temporary image blocks in the adjacent areas of the image blocks to be processed do not adopt the second reference frame image list to obtain the motion vector, the second-direction prediction is invalid.

12. The method according to claim 10, wherein the motion vector comprises a first motion vector and/or a second motion vector, and the first vector prediction is valid when the first motion vectors of at least two temporary image blocks in the neighboring area of the image block to be processed, which obtain the motion vector using a first reference frame image list, are different, wherein the first motion vector corresponds to the first reference frame image list;

when the first motion vectors of all temporary image blocks which adopt the first reference frame image list to obtain the motion vectors in the adjacent areas of the image blocks to be processed are the same, the first vector prediction is invalid;

when second motion vectors of at least two temporary image blocks which adopt a second reference frame image list to obtain motion vectors in adjacent areas of the image block to be processed are different, the second-direction prediction is effective, wherein the second motion vectors correspond to the second reference frame image list;

and when the second motion vectors of all temporary image blocks which adopt the second reference frame image list to obtain the motion vectors in the adjacent areas of the image blocks to be processed are the same, the second-direction prediction is invalid.

13. The method according to any of claims 11 to 12, wherein the temporary image block is an image block having a preset size.

14. The method according to any of the claims 11 to 12, wherein the neighboring area of the image block to be processed comprises: and one or any combination of the left airspace region, the upper airspace region, the right time domain region and the lower time domain region of the image block to be processed.

15. The method according to any of claims 1 to 9, wherein the size reference information comprises shape information of the image block to be processed; the shape information includes a width and a height;

the determining the size of the basic prediction block according to the size reference information comprises:

when the width of the image block to be processed is greater than or equal to the height of the image block to be processed, the width of the basic prediction block is 8 pixels, and the height of the basic prediction block is 4 pixels;

and when the width of the image block to be processed is smaller than the height of the image block to be processed, the width of the basic prediction block is 4 pixels, and the height of the basic prediction block is 8 pixels.

16. The method according to any of claims 1 to 9, wherein the basic prediction block has a width of 8 pixels and a height of 8 pixels.

17. The method according to any of the claims 10 to 14, wherein the size reference information comprises a prediction direction of the image block to be processed.

18. The method of claim 17, wherein the determining the size of the basic prediction block according to the size reference information comprises:

when the prediction direction of the image block to be processed is unidirectional prediction, the width of the basic prediction block is 4 pixels, and the height of the basic prediction block is 4 pixels;

when the prediction direction of the image block to be processed is bidirectional prediction, the width of the basic prediction block is 8 pixels, and the height of the basic prediction block is 4 pixels, or the width of the basic prediction block is 4 pixels, and the height of the basic prediction block is 8 pixels.

19. The method of claim 17, wherein the determining the size of the basic prediction block according to size reference information comprises:

when the prediction direction of the image block to be processed is bidirectional prediction and the width of the image block to be processed is greater than or equal to the height of the image block to be processed, the width of the basic prediction block is 8 pixels, and the height of the basic prediction block is 4 pixels;

and when the prediction direction of the image block to be processed is bidirectional prediction and the width of the image block to be processed is smaller than the height of the image block to be processed, the width of the basic prediction block is 4 pixels and the height of the basic prediction block is 8 pixels.

20. The method of claim 17, wherein the determining the size of the basic prediction block according to size reference information comprises:

when the prediction direction of the image block to be processed is bidirectional prediction, the width of the basic prediction block is 8 pixels, and the height of the basic prediction block is 8 pixels.

21. The method of claim 17, wherein the determining the size of the basic prediction block according to size reference information comprises:

when the prediction direction of the image block to be processed is bidirectional prediction, the width of the basic prediction block is 8 pixels, and the height of the basic prediction block is 8 pixels;

when the prediction direction of the image block to be processed is unidirectional prediction and the width of the image block to be processed is greater than or equal to the height of the image block to be processed, the width of the basic prediction block is 8 pixels, and the height of the basic prediction block is 4 pixels;

and when the prediction direction of the image block to be processed is unidirectional prediction and the width of the image block to be processed is smaller than the height of the image block to be processed, the width of the basic prediction block is 4 pixels and the height of the basic prediction block is 8 pixels.

22. The method according to any of claims 1 to 21, further comprising, after said determining the size of the basic prediction block in the to-be-processed image block:

dividing the image block to be processed into a plurality of basic prediction blocks according to the size;

and sequentially determining the position of each basic prediction block in the image block to be processed.

23. The method according to any of claims 1 to 22, wherein before said determining the size of the basic prediction block in the to-be-processed image block according to the size reference information, the method further comprises:

and determining that the first reference block and the second reference block are positioned in the image boundary where the image block to be processed is positioned.

24. The method according to any of claims 1 to 23, wherein before said determining the size of the basic prediction block in the to-be-processed image block according to the size reference information, the method further comprises:

determining that the width of the image block to be processed is greater than or equal to 16 and the height of the image block to be processed is greater than or equal to 16; or determining that the width of the image block to be processed is greater than or equal to 16; or, determining that the height of the image block to be processed is greater than or equal to 16.

25. The method according to any of the claims 1 to 24, wherein said method is used for encoding said image block to be processed or for decoding said image block to be processed.

26. An apparatus for inter-frame prediction, comprising:

a determining module, configured to determine, according to size reference information, a size of a basic prediction block in an image block to be processed, where the size is used to determine a position of the basic prediction block in the image block to be processed;

a positioning module for determining a first reference block and a second reference block of the basic prediction block according to the position; wherein a left boundary of the first reference block and a left boundary of the basic prediction block are collinear, an upper boundary of the second reference block and an upper boundary of the basic prediction block are collinear, the first reference block is adjacent to an upper boundary of the image block to be processed, and the second reference block is adjacent to a left boundary of the image block to be processed;

and the calculating module is used for performing weighted calculation on one or more of the motion vector corresponding to the first reference block, the motion vector corresponding to the second reference block and the motion vector corresponding to the original reference block with a preset position relation with the image block to be processed so as to obtain the motion vector corresponding to the basic prediction block.

27. The apparatus according to claim 26, wherein the original reference block having a predetermined positional relationship with the image block to be processed comprises: and the original reference block has a preset spatial domain position relation with the image block to be processed, and/or the original reference block has a preset time domain position relation with the image block to be processed.

28. The apparatus according to claim 27, wherein the original reference block having a predetermined spatial position relationship with the image block to be processed comprises: one or more of an image block which is positioned at the upper left corner of the image block to be processed and is adjacent to the upper left corner point of the image block to be processed, an image block which is positioned at the upper right corner of the image block to be processed and is adjacent to the upper right corner point of the image block to be processed, and an image block which is positioned at the lower left corner of the image block to be processed and is adjacent to the lower left corner point of the image block to be processed; and the original reference block which has a preset spatial domain position relation with the image block to be processed is positioned outside the image block to be processed.

29. The apparatus according to claim 27 or 28, wherein the original reference block having a preset temporal position relationship with the image block to be processed comprises: an image block which is positioned at the lower right corner of a mapping image block in a target reference frame and is adjacent to the lower right corner of the mapping image block; the original reference block which has a preset time domain position relation with the to-be-processed image block is located outside the mapping image block, the size of the mapping image block is equal to that of the to-be-processed image block, and the position of the mapping image block in the target reference frame is the same as that of the to-be-processed image block in the image frame of the to-be-processed image block.

30. The apparatus of claim 29, wherein the index information and the reference frame list information of the target reference frame are obtained by parsing a code stream.

31. The apparatus according to claim 30, wherein the index information and reference frame list information of the target reference frame are located in a code stream segment corresponding to a slice head of a slice in which the to-be-processed image block is located.

32. The apparatus according to any one of claims 29 to 31, wherein the computing module is specifically configured to:

obtaining a motion vector corresponding to the basic prediction block according to the following formula:

P(x,y)＝(H×P_h(x,y)+W×P_v(x,y)+H×W)/(2×H×W)；

wherein the content of the first and second substances,

P_h(x,y)＝(W-1-x)×L(-1,y)+(x+1)×R(W,y)；

P_v(x,y)＝(H-1-y)×A(x,-1)+(y+1)×B(x,H)；

R(W,y)＝((H-y-1)×AR+(y+1)×BR)/H；

B(x,H)＝((W-x-1)×BL+(x+1)×BR)/W；

33. The apparatus according to any one of claims 26 to 32, wherein the size reference information comprises a first identifier; the first flag is used to indicate a size of the basic prediction block;

the device further comprises: a receiving unit, configured to receive a code stream; the analysis unit is used for analyzing the code stream received by the receiving unit to obtain the first identifier; the first identifier is located in a code stream segment corresponding to any one of a sequence parameter set of a sequence where the image blocks to be processed are located, an image parameter set of an image where the image blocks to be processed are located, and a strip header of a strip where the image blocks to be processed are located.

34. The apparatus of any one of claims 26 to 33, wherein the determining module is further configured to:

and determining the prediction direction of the image block to be processed.

35. The apparatus of claim 34, wherein the determining module is specifically configured to:

36. The apparatus of claim 35, wherein the determining module is specifically configured to:

37. The apparatus of claim 35, wherein the determining module is specifically configured to:

the motion vector comprises a first motion vector and/or a second motion vector, and when first motion vectors of at least two temporary image blocks which adopt a first reference frame image list to obtain the motion vector in adjacent areas of the image block to be processed are different, the first vector prediction is effective, wherein the first motion vector corresponds to the first reference frame image list;

38. The apparatus according to any of the claims 36 to 37, wherein the temporary image blocks are image blocks having a preset size.

39. The apparatus according to any of the claims 36 to 38, wherein the neighboring area of the image block to be processed comprises: and one or any combination of the left airspace region, the upper airspace region, the right time domain region and the lower time domain region of the image block to be processed.

40. The apparatus according to any of the claims 26 to 34, wherein the size reference information comprises shape information of the image block to be processed; the shape information includes a width and a height;

the determining module is specifically configured to:

41. The apparatus according to any one of claims 26 to 34, wherein the basic prediction block has a width of 8 pixels and a height of 8 pixels.

42. The apparatus according to any of the claims 36 to 39, wherein the size reference information comprises a prediction direction of the image block to be processed.

43. The apparatus of claim 42, wherein the determining module is specifically configured to:

44. The apparatus of claim 42, wherein the determining module is specifically configured to:

45. The apparatus of claim 42, wherein the determining module is specifically configured to:

46. The apparatus of claim 42, wherein the determining module is specifically configured to:

47. The apparatus of any one of claims 26 to 46, further comprising:

a dividing module, configured to divide the to-be-processed image block into a plurality of the basic prediction blocks according to the size; and sequentially determining the position of each basic prediction block in the image block to be processed.

48. The apparatus according to any one of claims 26 to 47, further comprising a determining module configured to:

49. The apparatus according to any one of claims 26 to 48, further comprising a determining module configured to:

50. The apparatus according to any of the claims 26 to 49, wherein said apparatus is configured to encode said image block to be processed, or to decode said image block to be processed.

51. An apparatus for inter-prediction, the apparatus comprising: one or more processors, memory, and a communication interface;

the memory, the communication interface and the one or more processors; the apparatus for inter-prediction communicates with other devices through the communication interface, the memory for storing computer program code comprising instructions which, when executed by the one or more processors, cause the apparatus to perform the method for inter-prediction according to any of claims 1-25.

52. A computer-readable storage medium comprising instructions that, when executed on an apparatus for inter-prediction, cause the convolution apparatus for a neural network to perform a method for inter-prediction as defined in any one of claims 1 to 25.

53. A computer program product comprising instructions which, when run on an apparatus for inter-prediction, cause the apparatus for inter-prediction to perform the method of inter-prediction according to any one of claims 1-25.