WO2019084776A1

WO2019084776A1 - Method and device for obtaining candidate motion information of image block, and codec

Info

Publication number: WO2019084776A1
Application number: PCT/CN2017/108611
Authority: WO
Inventors: 陈旭; 安基程; 郑建铧
Original assignee: 华为技术有限公司
Priority date: 2017-10-31
Filing date: 2017-10-31
Publication date: 2019-05-09

Abstract

A method for obtaining candidate motion information of an image block. The candidate motion information is used for constructing an inter-frame prediction candidate list. The method comprises: detecting one or more spatial reference blocks of a current image block according to a first preset sequence to obtain M sets of original candidate motion information in a candidate list of the current image block; detecting one or more temporal reference blocks of the current image block according to a second preset sequence to obtain L sets of original candidate motion information in a candidate list of an image block to be processed; and if the quantity of candidate motion information in the candidate list of the image to be processed is less than a target quantity, decomposing at least one set of original candidate motion information of the bidirectional prediction type comprised in the candidate list to obtain Q sets of newly constructed candidate motion information of the unidirectional prediction type in the candidate list of the image block to be processed. The technical solution of the present application improves the predictive accuracy of motion vectors of the image block and increases the coding and decoding performance.

Description

Method, device and codec for acquiring candidate motion information of image block

Technical field

The present application relates to the field of video image coding and decoding technologies, and in particular, to a method, an apparatus, an encoder, and a decoder for acquiring candidate motion information of an image block.

Background technique

Through video compression technology, such as MPEG-2, MPEG-4, ITU-TH.263, ITU-TH.264/MPEG-4 Part 10 advanced video coding (AVC), ITU-TH.265 high Efficiently transmitting and receiving digital video information between devices can be achieved between the high efficiency video coding (HEVC) standard and the video compression techniques described in the extended section of the standard. Typically, an image of a video sequence is divided into image blocks for encoding or decoding.

In video compression technology, in order to reduce or remove redundant information in a video sequence, image block based spatial prediction (intra prediction) and/or temporal prediction (inter prediction) are introduced. The inter prediction mode may include, but is not limited to, a merge mode (Merge Mode) and a non-merge mode (for example, an advanced motion vector prediction mode (AMVP mode), etc., and both are inter-predictions by using a method of multi-motion information competition. of.

In the inter prediction process, a candidate list including multiple sets of motion information (also referred to as candidate motion information) is introduced. For example, the encoder may select a suitable candidate motion information from the candidate list to predict the current to be encoded. The motion information (e.g., motion vector) of the image block, thereby obtaining the best reference image block (i.e., prediction block) of the current image block to be encoded.

However, regardless of the merge mode or the non-merge mode, the maximum number of candidates of candidate motion information in the candidate list is defined. Once the available candidate motion information is insufficient, a default value (eg, a zero vector) is added as candidate motion information to the candidate list to satisfy the maximum candidate number requirement, and an index identification is assigned to each set of candidate motion information. It can be seen that this approach leads to a lower reference meaning of some candidate motion information in the candidate list, which in turn leads to a lower accuracy of motion vector prediction, thereby affecting the codec performance.

Summary of the invention

The embodiment of the present application provides a method and an apparatus for acquiring candidate motion information of an image block, and a corresponding encoder and decoder, which improve the accuracy of motion vector prediction, thereby improving codec performance.

In a first aspect, an embodiment of the present application provides a method for acquiring candidate motion information of an image block, where the candidate motion information is used to construct a candidate list for inter prediction, where the method includes: the candidate motion information acquiring device follows a first preset sequence, detecting one or more spatial reference blocks of the current image block, obtaining M sets of original candidate motion information in the candidate list of the image block to be processed, where M is an integer greater than or equal to 0; The motion information acquiring apparatus detects one or more time domain reference blocks of the current image block according to a second preset sequence, and obtains L sets of original candidate motion information in the candidate list of the image block to be processed, where L is An integer greater than or equal to 0; when the number of candidate motion information in the candidate list of the image block to be processed is less than the target number, the candidate motion information acquiring means further performs at least one set of bidirectional prediction types included in the candidate list Raw candidate motion information (also known as bidirectional pre- Performing decomposition processing on the original candidate motion information of the encoding/decoding mode to obtain candidate motion information of the unidirectional prediction type of the Q group newly constructed in the candidate list of the image block to be processed (also referred to as using one-way prediction coding/ The candidate motion information of the decoding mode), Q is an integer greater than or equal to 0.

The decomposition here can be understood as the inverse process of the combination, that is, splitting the motion information using the bidirectional predictive encoding/decoding mode into a motion information using a backward predictive encoding/decoding mode and a motion using a forward predictive encoding/decoding mode. information.

It should be noted that the spatial reference block herein refers to a reference block related to the current image block spatial domain, and may include one or more spatial reference blocks adjacent to the current image block in the image of the current image block, and Or, one or more spatial reference blocks in the image in which the current image block is located that are not adjacent to the current image block.

It should be noted that the time domain reference block herein refers to a reference block related to the current image block time domain, and may include one or more airspace references in the reference image adjacent to the co-located block (co-located block). Block, and/or one or more sub-blocks of the co-located block, wherein the co-located block is an image block of the reference image having the same size, shape, and coordinates as the current image block. The reference image herein refers to a reconstructed image. Specifically, the reference image herein refers to a reference image in one or more reference image lists, for example, may be a reference corresponding to a specified reference image index in the specified reference image list. The image may also be the reference image in the first position in the default reference image list, which is not limited in this application. It should be noted that no matter which reference block is used, it refers to a motion vector image block (also referred to as an encoded image block or a decoded image block).

It should be noted that the motion information of each reference block (ie, each set of motion information) may include a motion vector MV and reference image indication information. Of course, the motion information may also include only one or all of the two. For example, if the codec side agrees on the reference image, the motion information may only include the motion vector MV. Wherein the reference picture indication information is used to indicate which one or which reconstructed images are used as the reference image in the current block (the current block refers to the currently available reference block in the current segment), and the motion vector indicates that the reference block position is relative to the current in the used reference image. The positional offset of the block position generally includes a horizontal component offset and a vertical component offset. For example, (x, y) is used to represent the MV, x is the positional shift in the horizontal direction, and y is the positional shift in the vertical direction. Using the position of the current block plus the MV offset, the position of its reference block in the reference image can be obtained. The reference image indication information may include a reference image list and a reference image index corresponding to the reference image list. The reference image index is used to identify the reference image pointed to by the motion vector in the specified reference image list (RefPicList0 or RefPicList1).

For a reference block coded with inter prediction, a set of motion information for the reference block may include motion information for the forward and backward prediction directions. Here, the forward and backward prediction directions are two prediction directions of the bidirectional prediction mode, and it can be understood that "forward" and "backward" respectively correspond to the reference image list 0 (RefPicList0) and the reference image of the current image. List 1 (RefPicList1). The "forward" prediction direction (RefPicList0) means that the reference image is temporally before the current image. The "backward" prediction direction (RefPicList1) means that the reference image is temporally after the current image. When only one reference picture list is available for an image or a slice, only RefPicList0 is available, and the motion information for each image block of the slice is always positive.

It should be understood that, in different application scenarios, the candidate motion information acquiring device may be a video encoder or a video decoder, for example, may be a motion estimator in a video encoder, or a motion compensator in a video decoder.

It should be noted that the candidate motion information in the candidate list of the to-be-processed image block may include the foregoing M sets of original candidate motion information and L sets of original candidate motion information, and may of course include candidate motion information acquired in other manners. The application is not limited to this.

It can be seen that by processing the original candidate motion information of at least one set of bidirectional prediction types by decomposition to obtain the candidate motion information of the unidirectional prediction type newly constructed by the Q group, more candidate motion information with reference significance can be mined as much as possible, to a certain extent. To reduce or avoid the use of the zero vector to fill the candidate list, for example, for the same codec application scenario, multiple zero vectors may be filled before the technical solution of the present application is introduced; and after the technical solution of the present application is introduced, it may not be used. Filling the zero vector or reducing the padding zero vector improves the accuracy of motion vector prediction to a certain extent, thereby improving the codec performance.

In conjunction with the first aspect, in some implementations of the first aspect, the set of bi-predictive types of original candidate motion information (also referred to as a set of bi-predictive encoding/decoding mode original candidate motion information) includes: Motion information of a forward prediction direction and motion information for a backward prediction direction, wherein the motion information for the forward prediction direction includes a first reference image list and a first reference image index corresponding to the first reference image list And a motion vector of the first reference image corresponding to the first reference image index; the motion information for the backward prediction direction includes a second reference image list and a second reference image index corresponding to the second reference image list And a motion vector directed to the second reference image corresponding to the second reference image index;

After the decomposition processing, the Q-group newly constructed unidirectional prediction type candidate motion information includes: a unidirectional prediction type (also referred to as a unidirectional prediction encoding/decoding mode) is a forward prediction direction (also referred to as a forward prediction coding). a set of motion information and/or a unidirectional prediction type of a backward prediction direction (also referred to as a backward prediction encoding/decoding mode), wherein the set of motions of the forward prediction direction The information includes a first reference image list and a first reference image index corresponding to the first reference image list and a motion vector of the first reference image corresponding to the first reference image index; a set of motions of the backward prediction direction The information includes a second reference image list and a second reference image index corresponding to the second reference image list and a motion vector of the second reference image corresponding to the second reference image index.

With reference to the first aspect, in some implementations of the first aspect, in order to further extract more reference motion information of the reference meaning to further improve the accuracy of motion vector prediction, the method may further include : when the number of candidate motion information in the candidate list of the image block to be processed is smaller than the target number, the original candidate motion information of the two sets of unidirectional prediction types included in the candidate list (also referred to as one-way Performing a combination process on the original candidate motion information of the prediction encoding/decoding mode to obtain P-group newly constructed bidirectional prediction type candidate motion information in the candidate list of the to-be-processed image block (also referred to as bidirectional prediction encoding/decoding mode) Candidate motion information), P is an integer greater than or equal to zero.

The combination here refers to combining a set of unidirectional prediction types with original prediction motion information of the forward prediction direction and another set of unidirectional prediction types with original candidate motion information of the backward prediction direction to obtain a set of newly constructed bidirectional predictions. Types of candidate motion information; in other words, combining a set of original candidate motion information using a forward predictive encoding/decoding mode with another set of original candidate motion information using a backward predictive encoding/decoding mode to obtain a set of newly constructed Candidate motion information in a bidirectional predictive encoding/decoding mode is employed.

It should be understood that the combined steps may occur before or after the decomposition step, or may occur simultaneously, and the application is not limited thereto.

In conjunction with the first aspect, in some implementations of the first aspect, the one or more spatial reference blocks include: one or more spatial reference blocks in the image in which the current image block is located adjacent to the current image block And/or one or more spatial reference blocks in the image in which the current image block is located that are not adjacent to the image block to be processed.

In conjunction with the first aspect, in some implementations of the first aspect, the one or more spatial reference blocks in the image in which the current image block is located adjacent to the current image block include:

a fourth airspace neighboring block A0 located at a lower left side of the current image block, a first airspace neighboring block A1 located at a left side of the current image block, and a third airspace neighboring block B0 located at an upper right side of the current image block. The second spatial domain adjacent block B1 on the upper side of the current image block, or the fifth airspace adjacent block B2 located on the upper left side of the current image block.

With reference to the first aspect, in some implementations of the first aspect, the one or more spatial reference blocks of the current image block are detected according to the first preset sequence, to obtain a candidate list of the to-be-processed image block. The M group of original candidate motion information in the medium may include:

Detecting whether the first airspace neighboring block A1, the second airspace neighboring block B1, the third airspace neighboring block B0, the fourth airspace neighboring block A0, and the fifth airspace neighboring block B2 are available to obtain the first airspace neighboring The motion information of the M1 determined motion vector image blocks in the block A1, the second spatial neighboring block B1, the third spatial neighboring block B0, the fourth spatial neighboring block A0, and the fifth spatial neighboring block B2, where M1 is greater than or equal to 0. Integer,

Adding M sets of motion information in the motion information of the detected M1 determined motion vector image blocks as candidate motion information to the candidate list, where M1 is equal to or greater than M;

Wherein: the detection condition of the fifth airspace neighboring block B2 includes: when any one of the first airspace neighboring block A1, the second airspace neighboring block B1, the third airspace neighboring block B0, and the fourth airspace neighboring block A0 is unavailable, The fifth airspace neighboring block B2 is detected.

It should be understood that if two or more sets of motion information identical to each other exist in the motion information of the M1 determined motion vector image blocks, only one of the two or more sets of motion information identical to each other is selected. Join the candidate list.

In conjunction with the first aspect, in some implementations of the first aspect, the one or more time domain reference blocks include: a lower right spatial domain neighboring block H of a co-located block (co-located block) of the current image block The upper left intermediate block C0 of the co-located block, the lower right intermediate block C3 of the co-located block, the upper left block TL of the co-located block, or the lower right block BR of the co-located block, wherein The co-located block is an image block of the reference image having the same size, shape, and coordinates as the current image block.

With reference to the first aspect, in some implementations of the first aspect, the detecting, by the second preset sequence, one or more time domain reference blocks of the current image block, to obtain the to-be-processed image block The L group of original candidate motion information in the candidate list includes:

Detecting, in sequence, the right lower spatial neighboring block H of the co-located block, and whether the lower right intermediate block C3 of the co-located block is available, to obtain motion information of the L1 determined motion vector image blocks; or

Detecting, in sequence, the lower right spatial neighboring block H of the co-located block, and whether the upper left intermediate block C0 of the co-located block is available to obtain motion information of the L2 determined motion vector image blocks; or

Detecting, in sequence, the lower right spatial neighboring block H of the co-located block, the lower right intermediate block C3 of the co-located block, the upper left block TL of the co-located block, and the lower right block BR of the co-located block. Whether the upper left intermediate block C0 of the same position block is available to obtain motion information of the L3 determined motion vector image blocks;

The L group motion information in the motion information of the detected L1 or L2 or L3 determined motion vector image blocks is added as candidate motion information to the candidate list, L1 is equal to or greater than L, or L2 is equal to or greater than L, Or L3 is equal to or greater than L, and L1, L2, and L3 are all integers greater than or equal to zero.

With reference to the first aspect, in some implementations of the first aspect, the target quantity is a preset maximum number of candidate motion information in a candidate list of the current image block; or the target quantity is a utilization code stream The number of candidate motion information determined by the index identification obtained in the parsing.

A second aspect of the present application provides an apparatus for acquiring candidate motion information of an image block, where the candidate motion information is used to construct a candidate list for inter prediction, and the apparatus includes: an airspace candidate motion information acquiring module, Detecting one or more spatial reference blocks of the current image block according to the first preset sequence, and obtaining M sets of original candidate motion information in the candidate list of the image block to be processed, where M is an integer greater than or equal to 0. a time domain candidate motion information acquiring module, configured to detect one or more time domain reference blocks of the current image block according to a second preset sequence, to obtain an L group in the candidate list of the image block to be processed The original candidate motion information, L is an integer greater than or equal to 0; the additional candidate motion information acquiring module is configured to: when the number of candidate motion information in the candidate list of the image block to be processed is smaller than the target number, Decomposing original candidate motion information of at least one set of bidirectional prediction types (motion information of bidirectional predictive encoding/decoding mode) included in the method A list of candidate types of the unidirectional prediction image block to be processed in the new set Q of candidate motion information structure (unidirectional predictive coding / decoding mode to the motion information), Q is an integer greater than or equal to 0.

With reference to the second aspect, in some implementations of the second aspect, the set of bidirectional prediction types of original candidate motion information (motion information of a set of bidirectional predictive codec modes) comprises: motion for a forward prediction direction Information and motion information for a backward prediction direction, wherein the motion information for the forward prediction direction includes a first reference image list and a first reference image index corresponding to the first reference image list and points to the first a motion vector of the first reference image corresponding to the reference image index; the motion information for the backward prediction direction includes a second reference image list and a second reference image index corresponding to the second reference image list and pointing to the second a motion vector of the second reference image corresponding to the reference image index;

After the additional candidate motion information acquiring module is decomposed, the Q-group newly constructed unidirectional prediction type candidate motion information includes: a unidirectional prediction type (unidirectional prediction codec mode) is a forward prediction encoding/decoding mode. a set of motion information and/or a unidirectional prediction type (unidirectional predictive codec mode) is a set of motion information of a backward predictive encoding/decoding mode, wherein the set of motion information of the forward predictive encoding/decoding mode includes a first reference image list and a first reference image index corresponding to the first reference image list and a motion vector of the first reference image corresponding to the first reference image index; a set of the backward prediction encoding/decoding mode The motion information includes a second reference image list and a second reference image index corresponding to the second reference image list and a motion vector of the second reference image corresponding to the second reference image index.

It should be understood that if the candidate motion information of a newly constructed unidirectional prediction type is overlapped with the candidate motion information existing in the candidate list, the candidate motion information of the newly constructed unidirectional prediction type of the group is not put into the candidate list. in. Alternatively, the candidate motion information of the newly constructed unidirectional prediction type may be placed in the candidate list before the deduplication operation is performed.

With reference to the second aspect, in some implementations of the second aspect, when the number of candidate motion information in the candidate list of the image block to be processed is smaller than the target number, the additional candidate motion information acquiring module is further configured to: Combining original candidate motion information of two sets of unidirectional prediction types (unidirectional prediction encoding/decoding modes) included in the candidate list to obtain a bidirectional new configuration bidirectional in the candidate list of the to-be-processed image block Prediction type candidate motion information (candidate motion information of bidirectional prediction encoding/decoding mode), P is an integer greater than or equal to zero.

In conjunction with the second aspect, in some implementations of the second aspect, the one or more spatial reference blocks include: one or more spatial reference blocks in the image in which the current image block is located adjacent to the current image block And/or one or more spatial reference blocks in the image in which the current image block is located that are not adjacent to the image block to be processed.

In conjunction with the second aspect, in an implementation of the second aspect, the one or more spatial reference blocks in the image in which the current image block is located adjacent to the current image block include:

With reference to the second aspect, in some implementations of the second aspect, the spatial domain candidate motion information acquiring module is configured to sequentially detect the first airspace neighboring block A1, the second airspace neighboring block B1, and the third airspace neighboring block. Whether the B0, the fourth airspace neighboring block A0, and the fifth airspace neighboring block B2 are available, to obtain the first airspace neighboring block A1, the second airspace neighboring block B1, the third airspace neighboring block B0, and the fourth airspace neighboring block A0, The fifth spatial domain is adjacent to the motion information of the M1 determined motion vector image blocks in the block B2, and M1 is an integer greater than or equal to 0; the M group motion information in the motion information of the detected M1 determined motion vector image blocks is used as The candidate motion information is added to the candidate list, and M1 is equal to or greater than M; wherein: the detection condition of the fifth airspace neighboring block B2 includes: when the first airspace neighboring block A1, the second airspace neighboring block B1, and the third airspace When any one of the neighboring block B0 and the fourth airspace neighboring block A0 is unavailable, the fifth airspace neighboring block B2 is detected.

With reference to the second aspect, in some implementations of the second aspect, the one or more time domain reference blocks include: a lower right spatial domain adjacent block H of a co-located block (co-located block) of the current image block The upper left intermediate block C0 of the co-located block, the lower right intermediate block C3 of the co-located block, the upper left block TL of the co-located block, or the lower right block BR of the co-located block, wherein The co-located block is an image block of the reference image having the same size, shape, and coordinates as the current image block.

With reference to the second aspect, in an implementation manner of the second aspect, the time domain candidate motion information acquiring module is configured to:

In conjunction with the second aspect, in some implementations of the second aspect, the apparatus is configured to encode or decode a video image, the target number being a preset maximum number of candidate motion information in a candidate list of the current image block; Alternatively, the apparatus is for decoding a video image, the target number being the number of candidate motion information determined using an index identification parsed from the code stream.

A third aspect of the present application provides an apparatus for acquiring candidate motion information of an image block, the candidate motion information being used to construct a candidate list for inter prediction, including: a processor and a memory coupled to the processor The processor is configured to: in the first preset order, detect one or more spatial reference blocks of the current image block, and obtain M sets of original candidate motion information in the candidate list of the to-be-processed image block, where M is An integer greater than or equal to 0; detecting one or more time domain reference blocks of the current image block according to a second preset sequence, to obtain L sets of original candidate motion information in the candidate list of the image block to be processed L is an integer greater than or equal to 0; when the number of candidate motion information in the candidate list of the image block to be processed is less than the target number, the original candidate motion of at least one set of bidirectional prediction types included in the candidate list The information (original candidate motion information of the bidirectional predictive encoding/decoding mode) is subjected to decomposition processing to obtain a newly constructed Q group in the candidate list of the image block to be processed. To the candidate motion information of the prediction type, Q is an integer greater than or equal to zero.

In conjunction with the third aspect, in some implementations of the third aspect, the set of bi-predictive types of original candidate motion information (also referred to as a set of bi-predictive encoding/decoding mode original candidate motion information) includes: Motion information of a forward prediction direction and motion information for a backward prediction direction, wherein the motion information for the forward prediction direction includes a first reference image list and a first reference image index corresponding to the first reference image list And a motion vector of the first reference image corresponding to the first reference image index; the motion information for the backward prediction direction includes a second reference image list and a second reference image index corresponding to the second reference image list And a motion vector directed to the second reference image corresponding to the second reference image index;

With reference to the third aspect, in some implementations of the third aspect, in order to further mine more reference candidate motion information as much as possible to further improve the accuracy of motion vector prediction, the processor Further for: when the number of candidate motion information in the candidate list of the image block to be processed is smaller than the target number, the original candidate motion information of the two sets of unidirectional prediction types included in the candidate list (also referred to as And performing, by combining processing, the original candidate motion information of the unidirectional prediction encoding/decoding mode, to obtain candidate motion information of the P group newly constructed bidirectional prediction type in the candidate list of the to-be-processed image block (also referred to as bidirectional prediction coding/ Candidate motion information of the decoding mode), P is an integer greater than or equal to zero.

In conjunction with the third aspect, in some implementations of the third aspect, the one or more spatial reference blocks include: one or more spatial reference blocks in the image in which the current image block is located adjacent to the current image block And/or one or more spatial reference blocks in the image in which the current image block is located that are not adjacent to the image block to be processed.

In conjunction with the third aspect, in some implementations of the third aspect, the one or more spatial reference blocks in the image in which the current image block is located adjacent to the current image block include:

With reference to the third aspect, in some implementations of the third aspect, the one or more spatial reference blocks of the current image block are detected in the first preset order, to obtain a candidate for the image block to be processed. An aspect of the M sets of original candidate motion information in the list, the processor is configured to: sequentially detect the first airspace neighboring block A1, the second airspace neighboring block B1, the third airspace neighboring block B0, and the fourth airspace neighboring Whether the block A0 and the fifth spatial neighboring block B2 are available to obtain the first airspace neighboring block A1, the second airspace neighboring block B1, the third airspace neighboring block B0, the fourth airspace neighboring block A0, and the fifth airspace neighboring block B2. The motion information of the M1 determined motion vector image blocks, M1 is an integer greater than or equal to 0; and the M sets of motion information in the motion information of the detected M1 determined motion vector image blocks are added as candidate motion information to the In the candidate list, M1 is equal to or greater than M; wherein: the detection condition of the fifth spatial neighboring block B2 includes: when the first airspace neighboring block A1, the second airspace neighboring block B1, the third airspace neighboring block B0, and the fourth airspace When any one of the neighboring blocks A0 is not available, the fifth airspace neighboring block B2 is detected.

It should be understood that if two or more sets of motion information identical to each other exist in the motion information of the M1 determined motion vector image blocks, the processor only sets one of the two or more sets of motion information identical to each other. Group motion information is added to the candidate list.

In conjunction with the third aspect, in some implementations of the third aspect, the one or more time domain reference blocks include: a lower right spatial domain neighboring block H of a co-located block (co-located block) of the current image block The upper left intermediate block C0 of the co-located block, the lower right intermediate block C3 of the co-located block, the upper left block TL of the co-located block, or the lower right block BR of the co-located block, wherein The co-located block is an image block of the reference image having the same size, shape, and coordinates as the current image block.

With reference to the third aspect, in some implementations of the third aspect, the one or more time domain reference blocks of the current image block are detected in the second preset order to obtain the to-be-processed image. An aspect of the L sets of original candidate motion information in the candidate list of the block, the processor configured to: sequentially detect the right lower airspace neighboring block H of the co-located block, and whether the lower right intermediate block C3 of the co-located block is available Obtaining motion information of the L1 determined motion vector image blocks; or sequentially detecting whether the lower right spatial neighboring block H of the co-located block and the upper left intermediate block C0 of the co-located block are available to obtain L2 Determining motion information of the motion vector image block; or sequentially detecting the lower right spatial neighboring block H of the co-located block, the lower right intermediate block C3 of the co-located block, and the upper left block TL of the co-located block Whether the lower right block BR of the same location block, the upper left intermediate block C0 of the same location block is available, to obtain motion information of the L3 determined motion vector image blocks; and the detected L1 or L2 or L3 have been Determining motion vector image Group L in the motion information of the block The motion information is added to the candidate list as candidate motion information, L1 is equal to or greater than L, or L2 is equal to or greater than L, or L3 is equal to or greater than L, and L1, L2, and L3 are integers greater than or equal to zero.

With reference to the third aspect, in some implementations of the third aspect, the target quantity is a preset maximum number of candidate motion information in the candidate list of the current image block; or the target quantity is a utilization code stream The number of candidate motion information determined by the index identification obtained in the parsing.

A fourth aspect of the present application provides an apparatus for acquiring candidate motion information of an image block, configured to acquire candidate motion information to construct a candidate list for inter prediction, including: a processor and a memory coupled to the processor ;

The processor 1201 is configured to: according to the first preset sequence, detect one or more spatial reference blocks of the current image block, to obtain M sets of original candidate motion information for constructing a candidate list of the current image block, M is an integer greater than or equal to 0; detecting one or more time domain reference blocks of the current image block according to a second preset order, to obtain an L group for constructing a candidate list of the image block to be processed Raw candidate motion information, L is an integer greater than or equal to 0; when the number of candidate motion information for constructing the candidate list of the image block to be processed is smaller than the target number, candidates for constructing the image block to be processed The original candidate motion information of at least one set of bidirectional prediction types included in the candidate motion information of the list is subjected to decomposition processing to obtain candidate motion information of the unidirectional prediction type of the Q group newly constructed for constructing the candidate list of the to-be-processed image block. , Q is an integer greater than or equal to 0.

With reference to the fourth aspect, in some implementations of the fourth aspect, the original candidate motion information of the set of bidirectional prediction types includes: motion information for a forward prediction direction and motion information for a backward prediction direction, The motion information for the forward prediction direction includes a first reference image list and a first reference image index corresponding to the first reference image list and a motion vector of the first reference image corresponding to the first reference image index The motion information for the backward prediction direction includes a second reference image list and a second reference image index corresponding to the second reference image list and a motion vector of the second reference image corresponding to the second reference image index ;

After the decomposition processing, the candidate motion information of the unidirectional prediction type of the Q group newly constructed includes: a group of motion information whose unidirectional prediction type is a forward prediction direction and/or a group whose unidirectional prediction type is a backward prediction direction. Motion information, wherein the set of motion information of the forward prediction direction includes a first reference image list and a first reference image index corresponding to the first reference image list and a first reference image corresponding to the first reference image index a motion vector; the set of motion information of the backward prediction direction includes a second reference image list and a second reference image index corresponding to the second reference image list and a second reference image corresponding to the second reference image index Sport vector.

With reference to the fourth aspect, in some implementations of the fourth aspect, in order to further mine more reference candidate motion information as much as possible to further improve the accuracy of motion vector prediction, the processor Further used for:

When the number of candidate motion information for constructing the candidate list of the image block to be processed is smaller than the target number, the original candidate motion information of the two sets of unidirectional prediction types included in the candidate motion information for constructing the candidate list is constructed Performing a combination process to obtain candidate motion information of a P group newly constructed bidirectional prediction type for constructing a candidate list of the image block to be processed, P being an integer greater than or equal to 0.

In conjunction with the fourth aspect, in some implementations of the fourth aspect, the one or more spatial reference blocks include: one or more spatial reference blocks in the image in which the current image block is located adjacent to the current image block And/or one or more spatial reference blocks in the image in which the current image block is located that are not adjacent to the image block to be processed.

In conjunction with the fourth aspect, in some implementations of the fourth aspect, the one or more spatial reference blocks in the image in which the current image block is located adjacent to the current image block include:

In conjunction with the fourth aspect, in some implementations of the fourth aspect, the one or more time domain reference blocks include: a lower right spatial domain neighboring block H of a co-located block (co-located block) of the current image block The upper left intermediate block C0 of the co-located block, the lower right intermediate block C3 of the co-located block, the upper left block TL of the co-located block, or the lower right block BR of the co-located block, wherein The co-located block is an image block of the reference image having the same size, shape, and coordinates as the current image block.

With reference to the fourth aspect, in some implementations of the fourth aspect, the target quantity is a preset maximum number of candidate motion information in the candidate list of the current image block; or the target quantity is a utilization code stream The number of candidate motion information determined by the index identification obtained in the parsing.

A fifth aspect of the present application provides a video encoder, the video encoder for encoding an image block, comprising: an inter predictor, wherein the inter predictor comprises the second aspect or the third aspect or the fourth aspect The apparatus for acquiring candidate motion information of an image block, wherein the inter predictor is configured to determine a prediction block of a current image block to be encoded based on the candidate motion information selected in the candidate list; the video encoder further includes: an entropy encoder For indexing an index identifier for indicating the selected candidate motion information for the current image block to be encoded, and a reconstructor for reconstructing the image block based on the prediction block .

In an example implementation, the inter predictor herein may include a motion estimation module and a motion compensation module, where the motion estimation module is configured to acquire candidate motion information of a current image block to be encoded to construct a candidate list; and the motion compensation module is configured to: A prediction block of the current image block to be encoded is determined based on the candidate motion information selected in the candidate list.

With reference to the fifth aspect, in some implementations of the fifth aspect, the inter predictor is further configured to select candidate motion information for the current image block to be encoded from the plurality of candidate motion information included in the candidate list. And wherein the selected candidate motion information encodes the current code to be encoded image block with the lowest rate penalty cost.

A sixth aspect of the present application provides a video decoder, where the video decoder is configured to decode an image block from a code stream, including: an entropy decoder, configured to decode an index identifier from a code stream, where the index identifier is used And an apparatus for acquiring candidate motion information for an image block according to the second aspect or the third aspect or the fourth aspect, wherein the selected candidate motion information is used for the image block to be decoded; The inter predictor is configured to determine a prediction block of an image block to be decoded currently based on candidate motion information indicated by the index identifier; and a reconstructor to reconstruct the image block based on the prediction block.

A seventh aspect of the present application provides a computer readable storage medium having stored therein instructions that, when run on a computer, cause the computer to perform the method of the first aspect described above.

An eighth aspect of the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the first aspect described above.

A ninth aspect of the present application provides an electronic device, comprising the video encoder according to the above fifth aspect, or the video decoder according to the sixth aspect, or the image described in the second, third or fourth aspect A device for acquiring candidate motion information of a block.

It should be understood that the second to ninth aspects of the present application are consistent with the technical solutions of the first aspect of the present application, and the beneficial effects obtained by the aspects and the corresponding implementable design manners are similar, and are not described again.

DRAWINGS

1 is a schematic block diagram of a video encoding and decoding system in an embodiment of the present application;

2 is a schematic block diagram of a video encoder in an embodiment of the present application;

3 is a schematic block diagram of a video decoder in an embodiment of the present application;

4 is an exemplary flowchart of an encoding method performed by a video encoder in a merge mode in an embodiment of the present application;

FIG. 5 is an exemplary flowchart of motion compensation performed by a video decoder in an embodiment of the present application; FIG.

6A and 6B are schematic diagrams showing an encoding unit and an adjacent position image block and a non-adjacent position image block associated therewith in the embodiment of the present application;

FIG. 7 is an exemplary flowchart of a method for acquiring candidate motion information of an image block according to an embodiment of the present application;

FIG. 8 is another exemplary flowchart of a method for acquiring candidate motion information of an image block according to an embodiment of the present application;

FIG. 9 is an exemplary schematic diagram of adding a decomposed candidate motion vector to a merge mode candidate list in the embodiment of the present application;

FIG. 10 is an exemplary schematic diagram of adding a combined candidate motion vector to a merge mode candidate list in an embodiment of the present application; FIG.

FIG. 11 is a schematic block diagram of an apparatus for acquiring candidate motion information of an image block in an embodiment of the present application;

FIG. 12 is a schematic block diagram of an encoding device or a decoding device according to an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be clearly and completely described in the following with reference to the accompanying drawings in the embodiments.

FIG. 1 is a schematic block diagram of a video encoding and decoding system 10 in an embodiment of the present application. As shown in FIG. 1, system 10 includes source device 12 that produces encoded video data that will be decoded by destination device 14 at a later time. Source device 12 and destination device 14 may comprise any of a wide range of devices, including desktop computers, notebook computers, tablet computers, set top boxes, telephone handsets such as so-called "smart" phones, so-called "smart" "Touchpads, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices or the like. In some applications, source device 12 and destination device 14 may be equipped for wireless communication.

Destination device 14 may receive encoded video data to be decoded via link 16. Link 16 may include any type of media or device capable of moving encoded video data from source device 12 to destination device 14. In one possible implementation, link 16 may include communication media that enables source device 12 to transmit encoded video data directly to destination device 14 in real time. The encoded video data can be modulated and transmitted to destination device 14 in accordance with a communication standard (e.g., a wireless communication protocol). Communication media can include any wireless or wired communication medium, such as a radio frequency spectrum or one or more physical transmission lines. The communication medium can form part of a packet-based network (eg, a global network of local area networks, wide area networks, or the Internet). Communication media can include routers, switches, base stations, or any other equipment that can be used to facilitate communication from source device 12 to destination device 14.

Alternatively, the encoded data may be output from output interface 22 to storage device 24. Similarly, encoded data can be accessed from storage device 24 by an input interface. Storage device 24 may comprise any of a variety of distributed or locally accessed data storage media, such as a hard drive, Blu-ray Disc, DVD, CD-ROM, flash memory, volatile or non-volatile memory Or any other suitable digital storage medium for storing encoded video data. In another possible implementation, storage device 24 may correspond to a file server or another intermediate storage device that may maintain encoded video produced by source device 12. Destination device 14 may access the stored video data from storage device 24 via streaming or download. The file server can be any type of server capable of storing encoded video data and transmitting this encoded video data to destination device 14. Possible Implementations A file server includes a web server, a file transfer protocol server, a network attached storage device, or a local disk unit. Destination device 14 can access the encoded video data via any standard data connection that includes an Internet connection. This data connection may include a wireless channel (eg, a Wi-Fi connection), a wired connection (eg, a cable modem, etc.), or a combination of both, suitable for accessing encoded video data stored on a file server. The transmission of encoded video data from storage device 24 may be streaming, downloading, or a combination of both.

The techniques of this application are not necessarily limited to wireless applications or settings. Techniques may be applied to video decoding to support any of a variety of multimedia applications, such as over-the-air broadcast, cable television transmission, satellite television transmission, streaming video transmission (eg, via the Internet), encoding digital video for use in It is stored on a data storage medium and decodes digital video or other applications stored on the data storage medium. In some possible implementations, system 10 can be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.

In the possible implementation of FIG. 1, source device 12 includes video source 18, video encoder 20, and output interface 22. In some applications, output interface 22 can include a modulator/demodulator (modem) and/or a transmitter. In source device 12, video source 18 may include sources such as video capture devices (eg, cameras), video archives containing previously captured video, video feed interfaces to receive video from video content providers And/or a computer graphics system for generating computer graphics data as source video, or a combination of these sources. As a possible implementation, if the video source 18 is a video camera, the source device 12 and the destination device 14 may form a so-called camera phone or video phone. The techniques described in this application are illustratively applicable to video decoding and are applicable to wireless and/or wired applications.

Captured, pre-captured, or computer generated video may be encoded by video encoder 20. The encoded video data can be transmitted directly to the destination device 14 via the output interface 22 of the source device 12. The encoded video data may also (or alternatively) be stored on storage device 24 for later access by destination device 14 or other device for decoding and/or playback.

The destination device 14 includes an input interface 28, a video decoder 30, and a display device 32. In some applications, input interface 28 can include a receiver and/or a modem. Input interface 28 of destination device 14 receives encoded video data via link 16. The encoded video data communicated or provided on storage device 24 via link 16 may include various syntax elements generated by video encoder 20 for use by video decoders of video decoder 30 to decode the video data. These grammar elements The prime may be included with encoded video data transmitted over a communication medium, stored on a storage medium, or stored on a file server.

Display device 32 may be integrated with destination device 14 or external to destination device 14. In some possible implementations, destination device 14 can include an integrated display device and is also configured to interface with an external display device. In other possible implementations, the destination device 14 can be a display device. In general, display device 32 displays decoded video data to a user and may include any of a variety of display devices, such as a liquid crystal display, a plasma display, an organic light emitting diode display, or another type of display device.

Video encoder 20 and video decoder 30 may operate in accordance with, for example, the next generation video codec compression standard (H.266) currently under development and may conform to the H.266 Test Model (JEM). Alternatively, video encoder 20 and video decoder 30 may be according to, for example, the ITU-TH.265 standard, also referred to as a high efficiency video decoding standard, or other proprietary or industry standard of the ITU-TH.264 standard or an extension of these standards. In operation, the ITU-TH.264 standard is alternatively referred to as MPEG-4 Part 10, also known as advanced video coding (AVC). However, the techniques of this application are not limited to any particular decoding standard. Other possible implementations of the video compression standard include MPEG-2 and ITU-TH.263.

Although not shown in FIG. 1, in some aspects video encoder 20 and video decoder 30 may each be integrated with an audio encoder and decoder and may include a suitable multiplexer-demultiplexer ( MUX-DEMUX) unit or other hardware and software to handle the encoding of both audio and video in a common data stream or in a separate data stream. If applicable, in some possible implementations, the MUX-DEMUX unit may conform to the ITU H.223 multiplexer protocol or other protocols such as the User Datagram Protocol (UDP).

Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder circuits, such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), Field Programmable Gate Array (FPGA), discrete logic, software, hardware, firmware, or any combination thereof. When the technology is partially implemented in software, the apparatus may store the instructions of the software in a suitable non-transitory computer readable medium and execute the instructions in hardware using one or more processors to perform the techniques of the present application. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, any of which may be integrated into a combined encoder/decoder (CODEC) in a respective device. part.

The present application may illustratively involve video encoder 20 "signaling" particular information to another device, such as video decoder 30. However, it should be understood that video encoder 20 may signal information by associating particular syntax elements with various encoded portions of the video data. That is, video encoder 20 may "signal" the data by storing the particular syntax elements to the header information of the various encoded portions of the video data. In some applications, these syntax elements may be encoded and stored (eg, stored to storage system 34 or file server 36) prior to being received and decoded by video decoder 30. Thus, the term "signaling" may illustratively refer to the communication of grammar or other data used to decode compressed video data, whether this communication occurs in real time or near real time or occurs over a time span, such as may be encoded Occurs when a syntax element is stored to the media, and the syntax element can then be retrieved by the decoding device at any time after storage to the media.

JCT-VC developed the H.265 (HEVC) standard. HEVC standardization is based on an evolution model of a video decoding device called the HEVC Test Model (HM). The latest standard documentation for H.265 is available at http://www.itu.int/rec/T-REC-H.265. The latest version of the standard document is H.265 (12/16), which is the full text of the standard document. The manner of reference is incorporated herein. The HM assumes that the video decoding device has several additional capabilities with respect to existing algorithms of ITU-TH.264/AVC. For example, H.264 provides nine intra-prediction coding modes, while HM provides up to 35 intra-prediction coding modes.

JVET is committed to the development of the H.266 standard. The H.266 standardization process is based on an evolution model of a video decoding device called the H.266 test model. The algorithm description of H.266 is available from http://phenix.int-evry.fr/jvet, and the latest algorithm description is included in JVET-F1001-v2, which is incorporated herein by reference in its entirety. . At the same time, the reference software for the JEM test model is available from https://jvet.hhi.fraunhofer.de/svn/svn_HMJEMSoftware/, which is also incorporated herein by reference in its entirety.

In general, the working model description of HM can divide a video frame or image into a sequence of treeblocks or largest coding units (LCUs) containing both luminance and chrominance samples, also referred to as CTUs. Treeblocks have similar purposes to macroblocks of the H.264 standard. A stripe contains several consecutive treeblocks in decoding order. A video frame or image can be segmented into one or more stripes. Each tree block can be split into coding units according to a quadtree. For example, a tree block that is the root node of a quadtree can be split into four child nodes, and each child node can be a parent node again and split into four other child nodes. The final non-splitable child nodes that are leaf nodes of the quadtree include decoding nodes, such as decoded video blocks. The syntax data associated with the decoded code stream may define the maximum number of times the tree block can be split, and may also define the minimum size of the decoded node.

The coding unit includes a decoding node and a prediction unit (PU) and a transform unit (TU) associated with the decoding node. The size of the CU corresponds to the size of the decoding node and the shape must be square. The size of the CU may range from 8 x 8 pixels up to a maximum of 64 x 64 pixels or larger. Each CU may contain one or more PUs and one or more TUs. For example, syntax data associated with a CU may describe a situation in which a CU is partitioned into one or more PUs. The split mode may be different between situations where the CU is skipped or encoded by direct mode coding, intra prediction mode coding, or inter prediction mode. The PU can be divided into a shape that is non-square. For example, syntax data associated with a CU may also describe a situation in which a CU is partitioned into one or more TUs according to a quadtree. The shape of the TU can be square or non-square.

The HEVC standard allows for transforms based on TUs, which can be different for different CUs. The TU is typically sized based on the size of the PU within a given CU defined for the partitioned LCU, although this may not always be the case. The size of the TU is usually the same as or smaller than the PU. In some possible implementations, the residual samples corresponding to the CU may be subdivided into smaller units using a quadtree structure called a "residual qualtree" (RQT). The leaf node of the RQT can be referred to as a TU. The pixel difference values associated with the TU may be transformed to produce transform coefficients, which may be quantized.

In general, a PU contains data related to the prediction process. For example, when the PU is intra-mode encoded, the PU may include data describing the intra prediction mode of the PU. As another possible implementation manner, when the PU is inter-mode encoded, the PU may include data defining a motion vector of the PU. For example, the data defining the motion vector of the PU may describe the horizontal component of the motion vector, the vertical component of the motion vector, the resolution of the motion vector (eg, quarter-pixel precision or eighth-pixel precision), motion vector A reference image pointed to, and/or a reference image list of motion vectors (eg, list 0, list 1, or list C).

In general, TUs use transform and quantization processes. A given CU with one or more PUs may also contain one or more TUs. After prediction, video encoder 20 may calculate a residual value corresponding to the PU. The residual value includes pixel difference values, which can be transformed into transform coefficients, quantized, and scanned using TU to produce serialized transform coefficients for entropy decoding. The present application generally refers to the term "video block" to refer to a decoding node of a CU. In some specific applications, the term "video block" may also be used herein to refer to a tree block containing a decoding node as well as a PU and a TU, eg, an LCU or CU.

A video sequence usually contains a series of video frames or images. A group of picture (GOP) illustratively includes a series of one or more video images. The GOP may include syntax data in the header information of the GOP, in the header information of one or more of the images, or elsewhere, the syntax data describing the number of images included in the GOP. Each strip of the image may contain stripe syntax data describing the encoding mode of the corresponding image. Video encoder 20 is typically within an individual video stripe The video block operates to encode the video data. A video block may correspond to a decoding node within a CU. Video blocks may have fixed or varying sizes and may vary in size depending on the specified decoding criteria.

As a possible implementation, HM supports prediction of various PU sizes. Assuming that the size of a specific CU is 2N×2N, HM supports intra prediction of PU size of 2N×2N or N×N, and inter-frame prediction of 2N×2N, 2N×N, N×2N or N×N symmetric PU size prediction. The HM also supports asymmetric partitioning of inter-prediction of PU sizes of 2N x nU, 2N x nD, nL x 2N, and nR x 2N. In the asymmetric segmentation, one direction of the CU is not divided, and the other direction is divided into 25% and 75%. The portion of the CU corresponding to the 25% segment is indicated by an indication of "n" followed by "Up", "Down", "Left", or "Right". Thus, for example, "2N x nU" refers to a horizontally partitioned 2N x 2 NCU, where 2N x 0.5 NPU is at the top and 2N x 1.5 NPU is at the bottom.

In the present application, "N x N" and "N by N" are used interchangeably to refer to the pixel size of a video block in accordance with the vertical dimension and the horizontal dimension, for example, 16 x 16 pixels or 16 by 16 pixels. In general, a 16×16 block will have 16 pixels (y=16) in the vertical direction and 16 pixels (x=16) in the horizontal direction. Likewise, an N x N block has N pixels in the vertical direction and N pixels in the horizontal direction, where N represents a non-negative integer value. The pixels in the block can be arranged in rows and columns. Further, the block does not necessarily need to have the same number of pixels in the horizontal direction as in the vertical direction. For example, a block may include N x M pixels, where M is not necessarily equal to N.

After intra-predictive or inter-predictive decoding of a PU using a CU, video encoder 20 may calculate residual data for the TU of the CU. A PU may include pixel data in a spatial domain (also referred to as a pixel domain), and a TU may be included in transforming (eg, discrete cosine transform (DCT), integer transform, wavelet transform, or conceptually similar transform) Coefficients in the transform domain after application to the residual video data. The residual data may correspond to a pixel difference between a pixel of the uncoded image and a predicted value corresponding to the PU. Video encoder 20 may form a TU that includes residual data for the CU, and then transform the TU to generate transform coefficients for the CU.

After any transform to generate transform coefficients, video encoder 20 may perform quantization of the transform coefficients. Quantization illustratively refers to the process of quantizing the coefficients to possibly reduce the amount of data used to represent the coefficients to provide further compression. The quantization process can reduce the bit depth associated with some or all of the coefficients. For example, the n-bit value can be rounded down to an m-bit value during quantization, where n is greater than m.

The JEM model further improves the coding structure of video images. Specifically, a block coding structure called "Quad Tree Combined Binary Tree" (QTBT) is introduced. The QTBT structure rejects the concepts of CU, PU, TU, etc. in HEVC, and supports more flexible CU partitioning shapes. One CU can be square or rectangular. A CTU first performs quadtree partitioning, and the leaf nodes of the quadtree further perform binary tree partitioning. At the same time, there are two division modes in the binary tree division, symmetric horizontal division and symmetric vertical division. The leaf nodes of the binary tree are called CUs, and the CUs of the JEM cannot be further divided during the prediction and transformation process, that is, the CUs, PUs, and TUs of the JEM have the same block size. In the current JEM, the maximum size of the CTU is 256 × 256 luma pixels.

In some possible implementations, video encoder 20 may utilize a predefined scan order to scan the quantized transform coefficients to produce an entropy encoded serialized vector. In other possible implementations, video encoder 20 may perform an adaptive scan. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 20 may be based on context adaptive variable length decoding (CAVLC), context adaptive binary arithmetic decoding (CABAC), grammar based context adaptive binary. Arithmetic decoding (SBAC), probability interval partitioning entropy (PI PE) decoding, or other entropy decoding methods are used to entropy decode one-dimensional vectors. Video encoder 20 may also entropy encode syntax elements associated with the encoded video data for use by video decoder 30 to decode the video data.

To perform CABAC, video encoder 20 may assign contexts within the context model to the symbols to be transmitted. The context can be related to whether the adjacent value of the symbol is non-zero. In order to perform CAVLC, video encoder 20 may select a variable length code of the symbol to be transmitted. Codewords in variable length decoding (VLC) may be constructed such that relatively shorter codes correspond to more likely symbols, while longer codes correspond to less likely symbols. In this way, the use of VLC can achieve the goal of saving code rate with respect to using equal length codewords for each symbol to be transmitted. The probability in CABAC can be determined based on the context assigned to the symbol.

In an embodiment of the present application, a video encoder may perform inter prediction to reduce temporal redundancy between images. As described above, a CU may have one or more prediction units PU as specified by different video compression codec standards. In other words, multiple PUs may belong to the CU, or the PUs and CUs may be the same size. When the CU and the PU are the same size, the partition mode of the CU is not divided, or is divided into one PU, and the PU is used for expression. When the video encoder performs inter prediction, the video encoder can signal the video decoder for motion information for the PU. Exemplarily, the motion information of the PU may include: a reference image index, a motion vector, and a prediction direction identifier. The motion vector may indicate a displacement between an image block (also referred to as a video block, a block of pixels, a set of pixels, etc.) of the PU and a reference block of the PU. The reference block of the PU may be part of a reference image of an image block similar to a PU. The reference block may be located in a reference image indicated by the reference image index and the prediction direction indicator.

In order to reduce the number of coded bits required to represent the motion information of the PU, the video encoder may generate a candidate motion information list for each of the PUs according to the merge prediction mode or the advanced motion vector prediction mode process (hereinafter referred to as a candidate) List). Each candidate in the candidate list for the PU may represent a set of motion information. The motion information may include a motion vector MV and reference image indication information. Of course, the motion information may also include only one or all of the two. For example, if the codec side agrees on the reference image, the motion information may only include the motion vector. The motion information represented by some of the candidates in the candidate list may be based on motion information of other PUs. If the candidate indicates motion information specifying one of a spatial candidate positions or a temporal candidate positions, the present application may refer to the candidates as "original" candidate motion information. For example, for a merge mode, also referred to herein as a merge prediction mode, there may be five original spatial candidate locations and one original temporal candidate location. In some examples, the video encoder may also generate additional or additional candidate motion information by some means, such as inserting a zero motion vector as candidate motion information to generate additional candidate motion information. These additional candidate motion information are not considered raw candidate motion information and may be referred to as late or artificially generated candidate motion information in this application.

The techniques of the present application generally relate to techniques for generating a candidate list at a video encoder and techniques for generating the same candidate list at a video decoder. The video encoder and video decoder may generate the same candidate list by implementing the same techniques used to construct the candidate list. For example, both a video encoder and a video decoder can construct a list with the same number of candidates (eg, five candidates). The video encoder and decoder may first consider spatial candidates (eg, neighboring blocks in the same image), then consider temporal candidates (eg, candidates in different images), and finally may consider artificially generated candidates until Add the required number of candidates to the list. In accordance with the techniques of the present application, a pruning operation may be utilized for certain types of candidate motion information to remove duplicates from the candidate list during candidate list construction, while for other types of candidates, pruning may not be used to reduce decoder complexity . For example, for a set of spatial candidates and for temporal candidates, a pruning operation may be performed to exclude candidates with repeated motion information from the list of candidates. However, when an artificially generated candidate is added to the candidate's list, the artificially generated candidate may be added without performing a pruning operation on the artificially generated candidate.

After generating a candidate list for the PU of the CU, the video encoder may select candidate motion information from the candidate list and output an index identifier indicating the selected candidate motion information in the code stream. The selected candidate motion information may be motion information having a prediction block that produces the closest match to the PU being decoded. The aforementioned index identification may indicate the location of the candidate motion information selected in the candidate list. The video encoder may also generate a prediction block for the PU based on the reference block indicated by the motion information of the PU. The motion information of the PU may be determined based on the selected candidate motion information. For example, in the merge mode, it is determined that the selected candidate motion information is the motion information of the PU. In the AMVP mode, the motion information of the PU may be determined based on the motion vector difference of the PU and the selected candidate motion information. The video encoder may generate one or more residual image blocks (abbreviated as residual blocks) for the CU based on the predictive image blocks of the PU of the CU (referred to as prediction blocks for short) and the original image blocks for the CU. The video encoder may then encode one or more residual blocks and output a code stream.

The code stream may include data for identifying selected candidate motion information in the candidate list of PUs. The video decoder may determine motion information for the PU based on the selected candidate motion information in the candidate list of PUs. The video decoder may identify one or more reference blocks for the PU based on the motion information of the PU. After identifying one or more reference blocks of the PU, the video decoder may generate a prediction block for the PU based on one or more reference blocks of the PU. The video decoder may reconstruct an image block for the CU based on the prediction block for the PU of the CU and one or more residual blocks for the CU.

For ease of explanation, the present application may describe a location or image block as having various spatial relationships with a CU or PU. This description may be interpreted to mean that the location or image block and the image block associated with the CU or PU have various spatial relationships. In addition, the present application may refer to a PU that is currently being decoded by a video decoder as a current PU, also referred to as a current image block to be processed. The present application may refer to a CU currently being decoded by a video decoder as a current CU. The present application may refer to the image currently being decoded by the video decoder as the current image. It should be understood that the present application is applicable to the case where the PU and the CU have the same size, or the PU is the CU, and the PU is used uniformly.

As briefly described above, video encoder 20 may use inter prediction to generate prediction blocks and motion information for PUs of the CU. In some examples, the motion information of the PU may be the same or similar to the motion information of one or more neighboring PUs (ie, PUs whose image blocks are spatially or temporally near the image block of the PU). Because neighboring PUs often have similar motion information, video encoder 20 may encode motion information for the PU with reference to motion information of neighboring PUs. Encoding the motion information of the PU with reference to the motion information of the neighboring PU may reduce the number of coded bits required in the code stream indicating the motion information of the PU.

Video encoder 20 may encode motion information for the PU with reference to motion information of neighboring PUs in various manners. For example, video encoder 20 may indicate that the motion information for the PU is the same as the motion information for nearby PUs. The present application may use a merge mode to indicate that the motion information indicating the PU is the same as the motion information of the neighboring PU or may be derived from the motion information of the neighboring PU. In another possible implementation, video encoder 20 may calculate a Motion Vector Difference (MVD) for the PU. The MVD indicates the difference between the motion vector of the PU and the motion vector of the neighboring PU. Video encoder 20 may include the MVD instead of the motion vector of the PU in the motion information of the PU. The representation of the MVD in the code stream is less than the coded bits required to represent the motion vector of the PU. The present application can use the advanced motion vector prediction mode to refer to the motion information of the PU at the decoding end by using the index value of the MVD and the recognition candidate (ie, candidate motion information).

In order to use the merge mode or AMVP mode to signal the motion information of the PU at the decoder, video encoder 20 may generate a candidate list for the PU. The candidate list may include one or more candidates (ie, one or more sets of candidate motion information). Each candidate in the candidate list for the PU represents a set of motion information. The set of motion information may include a motion vector, a reference image list, and a reference image index corresponding to the reference image list.

After generating the candidate list for the PU, video encoder 20 may select one of a plurality of candidates from the candidate list for the PU. For example, a video encoder can compare each candidate with the PU being decoded and can select A candidate for the required rate-distortion cost. Video encoder 20 may output a candidate index for the PU. The candidate index can identify the location of the selected candidate in the candidate list.

Moreover, video encoder 20 may generate a prediction block for the PU based on the reference block indicated by the motion information of the PU. The motion information of the PU may be determined based on the selected candidate motion information in the candidate list for the PU.

When video decoder 30 receives the codestream, video decoder 30 may generate a candidate list for each of the PUs of the CU. The candidate list generated by video decoder 30 for the PU may be the same as the candidate list generated by video encoder 20 for the PU. The syntax elements parsed from the code stream may indicate the location of the candidate motion information selected in the candidate list of PUs. After generating the candidate list for the PU, video decoder 30 may generate a prediction block for the PU based on one or more reference blocks indicated by the motion information of the PU. Video decoder 30 may determine motion information for the PU based on candidate motion information selected in the candidate list for the PU. Video decoder 30 may reconstruct an image block for the CU based on the prediction block for the PU and the residual block for the CU.

It should be understood that, in a feasible implementation manner, at the decoding end, the construction of the candidate list is independent of the position of the candidate selected in the candidate list from the code stream, and may be performed in any order or in parallel.

In another feasible implementation manner, at the decoding end, the location of the selected candidate in the candidate list is first parsed from the code stream, and the candidate list is constructed according to the parsed location. In this embodiment, no construction is needed. For all candidate lists, only the candidate list at the parsed location needs to be constructed, that is, the candidate at the location can be determined. For example, when the code stream is parsed to find that the selected candidate is a candidate whose index identifier is 3 in the candidate list, only the candidate list from index 0 to index 3 needs to be constructed, and the index identifier is determined to be 3 Candidates can achieve the technical effect of reducing complexity and improving decoding efficiency.

FIG. 2 is a schematic block diagram of a video encoder 20 in the embodiment of the present application. Video encoder 20 may perform intra-frame decoding and inter-frame decoding of video blocks within a video stripe. Intra decoding relies on spatial prediction to reduce or remove spatial redundancy of video within a given video frame or image. Inter-frame decoding relies on temporal prediction to reduce or remove temporal redundancy of video within adjacent frames or images of a video sequence. The intra mode (I mode) may refer to any of a number of space based compression modes. An inter mode such as unidirectional prediction (P mode) or bidirectional prediction (B mode) may refer to any of several time-based compression modes.

In the possible embodiment of FIG. 2, video encoder 20 includes a partitioning unit 35, a prediction unit 41, a reference image memory 64, a summer 50, a transform processing unit 52, a quantization unit 54, and an entropy encoding unit 56. The prediction unit 41 includes an inter prediction unit (not shown) and an intra prediction unit 46. The inter prediction unit may include a motion estimation unit 42 and a motion compensation unit 44. For video block reconstruction, video encoder 20 may also include inverse quantization unit 58, inverse transform unit 60, and a summer (also referred to as reconstructor) 62. A deblocking filter (not shown in Figure 2) may also be included to filter the block boundaries to remove blockiness artifacts from the reconstructed video. The deblocking filter will typically filter the output of summer 62 as needed. In addition to the deblocking filter, an additional loop filter (in-loop or post-loop) can also be used.

As shown in FIG. 2, video encoder 20 receives video data, and segmentation unit 35 segments the data into video blocks. This partitioning may also include partitioning into strips, image blocks, or other larger units, and, for example, video block partitioning based on the quadtree structure of the LCU and CU. Video encoder 20 exemplarily illustrates the components of a video block encoded within a video strip to be encoded. In general, a stripe may be partitioned into multiple video blocks (and possibly into a collection of video blocks called image blocks).

Prediction unit 41 may select one of a plurality of possible decoding modes of the current video block based on the encoding quality and the cost calculation result (eg, rate-distortion cost, RDcost), such as one or more of a plurality of intra-coding modes One of the inter-frame decoding modes. Prediction unit 41 may provide the resulting intra-coded or inter-coded block to summer 50 to generate a residual The block data is differenceed and the resulting intra-coded or inter-coded block is provided to summer 62 to reconstruct the coded block for use as a reference picture.

Inter-prediction units (e.g., motion estimation unit 42 and motion compensation unit 44) within prediction unit 41 perform inter-predictive decoding of current video blocks relative to one or more of the one or more reference pictures to provide Time compression. Motion estimation unit 42 is operative to determine an inter prediction mode for the video stripe based on a predetermined pattern of the video sequence. The predetermined mode specifies the video strips in the sequence as P strips, B strips, or GPB strips. Motion estimation unit 42 and motion compensation unit 44 may be highly integrated, but are separately illustrated for conceptual purposes. The motion estimation performed by motion estimation unit 42 produces a process of estimating the motion vector of the video block. For example, the motion vector may indicate the displacement of the PU of the video block within the current video frame or image relative to the predicted block within the reference image.

The prediction block is a block of PUs that are found to closely match the video block to be decoded according to the pixel difference, and the pixel difference may be determined by absolute difference sum (SAD), squared difference sum (SSD) or other difference metric. In some possible implementations, video encoder 20 may calculate a value of a sub-integer pixel location of a reference image stored in reference image memory 64. For example, video encoder 20 may interpolate values of a quarter pixel position, an eighth pixel position, or other fractional pixel position of a reference image. Accordingly, motion estimation unit 42 may perform a motion search with respect to the full pixel position and the fractional pixel position and output a motion vector having fractional pixel precision.

Motion estimation unit 42 calculates the motion vector of the PU of the video block in the inter-coded slice by comparing the location of the PU with the location of the prediction block of the reference picture. The reference images may be selected from a first reference image list (List 0) or a second reference image list (List 1), each of the lists identifying one or more reference images stored in the reference image memory 64. Motion estimation unit 42 transmits the computed motion vector to entropy encoding unit 56 and motion compensation unit 44.

Motion compensation performed by motion compensation unit 44 may involve extracting or generating a prediction block based on motion vectors determined by motion estimation. After receiving the motion vector of the PU of the current video block, motion compensation unit 44 may locate the prediction block pointed to by the motion vector in one of the reference picture lists. The video encoder 20 forms a residual video block by subtracting the pixel value of the prediction block from the pixel value of the current video block being decoded, thereby forming a pixel difference value. The pixel difference values form residual data for the block and may include both luminance and chrominance difference components. Summer 50 represents one or more components that perform this subtraction. Motion compensation unit 44 may also generate syntax elements associated with video blocks and video slices for video decoder 30 to use to decode video blocks of video slices.

If the PU is located in a B-strip, the PU-containing image may be associated with two reference image lists called "List 0" and "List 1". In some possible implementations, an image containing B strips may be associated with a list combination that is a combination of List 0 and List 1.

Furthermore, if the PU is located in a B-strip, motion estimation unit 42 may perform uni-directional prediction or bi-directional prediction for the PU, wherein, in some possible implementations, bi-directional prediction is based on List 0 and List 1 reference image lists, respectively. The prediction performed by the image, in other possible embodiments, the bidirectional prediction is prediction based on the reconstructed future frame and the reconstructed past frame in the display order of the current frame, respectively. When the motion estimation unit 42 performs unidirectional prediction for the PU, the motion estimation unit 42 may search for a reference block for the PU in the reference image of list 0 or list 1. Motion estimation unit 42 may then generate a reference index indicating a reference picture containing the reference block in list 0 or list 1 and a motion vector indicating a spatial displacement between the PU and the reference block. The motion estimation unit 42 may output a reference index, a prediction direction identifier, and a motion vector as motion information of the PU. The prediction direction indicator may indicate that the reference index indicates the reference image in list 0 or list 1. Motion compensation unit 44 may generate a predictive image block of the PU based on the reference block indicated by the motion information of the PU.

When the motion estimation unit 42 performs bidirectional prediction for the PU, the motion estimation unit 42 may search for a reference block for the PU in the reference image in the list 0 and may also search for another one for the PU in the reference image in the list 1 Reference block. Motion estimation unit 42 may then generate a reference index indicating the reference picture containing the reference block in list 0 and list 1 and a motion vector indicating the spatial displacement between the reference block and the PU. The motion estimation unit 42 may output a reference index of the PU and a motion vector as motion information of the PU. Motion compensation unit 44 may generate a predictive image block of the PU based on the reference block indicated by the motion information of the PU.

In some possible implementations, motion estimation unit 42 does not output a complete set of motion information for the PU to entropy encoding module 56. Rather, motion estimation unit 42 may signal the motion information of the PU with reference to motion information of another PU. For example, motion estimation unit 42 may determine that the motion information of the PU is sufficiently similar to the motion information of the neighboring PU. In this embodiment, motion estimation unit 42 may indicate an indication value in a syntax structure associated with the PU that indicates to video decoder 30 that the PU has the same motion information as the neighboring PU or has a slave phase The motion information derived by the neighboring PU. In another embodiment, motion estimation unit 42 may identify candidates and motion vector differences (MVDs) associated with neighboring PUs in a syntax structure associated with the PU. The MVD indicates the difference between the motion vector of the PU and the indicated candidate associated with the neighboring PU. Video decoder 30 may use the indicated candidate and MVD to determine the motion vector of the PU.

As described above, prediction unit 41 may generate a candidate list for each PU of the CU. One or more of the candidate lists may include one or more sets of original candidate motion information and one or more sets of additional candidate motion information derived from the original candidate motion information.

Intra prediction unit 46 within prediction unit 41 may perform intra-predictive decoding of the current video block relative to one or more neighboring blocks in the same image or slice as the current block to be decoded to provide spatial compression . Thus, instead of inter-prediction (as described above) performed by motion estimation unit 42 and motion compensation unit 44, intra-prediction unit 46 may intra-predict the current block. In particular, intra prediction unit 46 may determine an intra prediction mode to encode the current block. In some possible implementations, intra-prediction unit 46 may encode the current block using various intra-prediction modes, for example, during separate encoding traversal, and intra-prediction unit 46 (or in some possible implementations, The mode selection unit 40) may select the appropriate intra prediction mode to use from the tested mode.

After the prediction unit 41 generates a prediction block of the current video block via inter prediction or intra prediction, the video encoder 20 forms a residual video block by subtracting the prediction block from the current video block. The residual video data in the residual block may be included in one or more TUs and applied to transform processing unit 52. Transform processing unit 52 transforms the residual video data into residual transform coefficients using, for example, a discrete cosine transform (DCT) or a conceptually similar transformed transform (eg, a discrete sinusoidal transform DST). Transform processing unit 52 may convert the residual video data from the pixel domain to a transform domain (eg, a frequency domain).

Transform processing unit 52 may send the resulting transform coefficients to quantization unit 54. Quantization unit 54 quantizes the transform coefficients to further reduce the code rate. The quantization process can reduce the bit depth associated with some or all of the coefficients. The degree of quantization can be modified by adjusting the quantization parameters. In some possible implementations, quantization unit 54 may then perform a scan of the matrix containing the quantized transform coefficients. Alternatively, entropy encoding unit 56 may perform a scan.

After quantization, entropy encoding unit 56 may entropy encode the quantized transform coefficients. For example, entropy encoding unit 56 may perform context adaptive variable length decoding (CAVLC), context adaptive binary arithmetic decoding (CABAC), syntax based context adaptive binary arithmetic decoding (SBAC), probability interval partition entropy ( PIPE) decoding or another entropy coding method or technique. Entropy encoding unit 56 may also entropy encode the motion vectors and other syntax elements of the current video strip being decoded. After entropy encoding by entropy encoding unit 56, the encoded code stream may be transmitted to video decoder 30 or archive for later transmission or retrieved by video decoder 30.

Entropy encoding unit 56 may encode information indicative of a selected intra prediction mode in accordance with the techniques of the present application. Video encoder 20 may include encoding of various blocks in transmitted code stream configuration data that may include multiple intra prediction mode index tables and a plurality of modified intra prediction mode index tables (also referred to as codeword mapping tables) A definition of the context and an indication of the MPM, the intra prediction mode index table, and the modified intra prediction mode index table for each of the contexts.

Inverse quantization unit 58 and inverse transform unit 60 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain for later use as a reference block for the reference image. Motion compensation unit 44 may calculate the reference block by adding the residual block to a prediction block of one of the reference pictures within one of the reference picture lists. Motion compensation unit 44 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for motion estimation. Summer 62 adds the reconstructed residual block to the motion compensated prediction block generated by motion compensation unit 44 to produce a reference block for storage in reference image memory 64. The reference block may be used by motion estimation unit 42 and motion compensation unit 44 as reference blocks to inter-predict subsequent video frames or blocks in the image.

It should be understood that other structural changes to video encoder 20 may be used to encode the video stream. For example, for certain image blocks or image frames, video encoder 20 may directly quantize the residual signal without the need for processing by transform unit 52, and accordingly need not be processed by inverse transform unit 60; or, for some image blocks Or the image frame, the video encoder 20 does not generate residual data, and accordingly does not need to be processed by the transform unit 52, the quantization unit 54, the inverse quantization unit 58, and the inverse transform unit 60; or, the quantization unit 54 and the inverse of the video encoder 20 Quantization units 58 can be combined together.

FIG. 3 is a schematic block diagram of a video decoder 30 in the embodiment of the present application. In the possible implementation of FIG. 3, video decoder 30 includes an entropy encoding unit 80, a prediction unit 81, an inverse quantization unit 86, an inverse transform unit 88, a summer 90, and a reference image memory 92. In a variant, the reference image memory 92 can also be placed outside of the video decoder 30. The prediction unit 81 includes an inter prediction unit (not shown) and an intra prediction unit 84. The inter prediction unit may be, for example, a motion compensation unit 82. In some possible implementations, video decoder 30 may perform an exemplary reciprocal decoding process with respect to the encoding flow described by video encoder 20 from FIG.

During the decoding process, video decoder 30 receives from video encoder 20 an encoded video bitstream representing the video blocks of the encoded video slice and associated syntax elements. Entropy encoding unit 80 of video decoder 30 entropy decodes the code stream to produce quantized coefficients, motion vectors, and other syntax elements. The entropy encoding unit 80 forwards the motion vectors and other syntax elements to the prediction unit 81. Video decoder 30 may receive syntax elements at the video stripe level and/or video block level.

When the video slice is decoded into an intra-coded (I) slice, intra-prediction unit 84 of prediction unit 81 may be based on the signaled intra prediction mode and data from the previously decoded block of the current frame or image. The predicted data of the video block of the current video stripe is generated.

When the video image is decoded into an inter-frame decoded (eg, B, P, or GPB) stripe, motion compensation unit 82 of prediction unit 81 generates a current video based on the motion vectors and other syntax elements received from entropy encoding unit 80. The predicted block of the video block of the image. The prediction block may be generated from one of the reference images within one of the reference image lists. Video decoder 30 may construct a reference image list (List 0 and List 1) using default construction techniques based on reference images stored in reference image memory 92.

Motion compensation unit 82 determines the prediction information for the video block of the current video slice by parsing the motion vector and other syntax elements, and uses the prediction information to generate a prediction block of the current video block that is being decoded. For example, motion compensation unit 82 uses some of the received syntax elements to determine a prediction mode (eg, intra prediction or inter prediction) of the video block used to decode the video slice, an inter prediction strip type (eg, B strip, P strip or GPB strip), strip reference Construction information of one or more of the image lists, motion vectors of each inter-coded video block of the stripe, inter-prediction status of each inter-coded video block of the stripe, and decoding of the current video stripe Additional information for the video block in .

Motion compensation unit 82 may also perform interpolation based on the interpolation filter. Motion compensation unit 82 may use the interpolation filters as used by video encoder 20 during encoding of the video block to calculate interpolated values for sub-integer pixels of the reference block. In this application, motion compensation unit 82 may determine the interpolation filters used by video encoder 20 from the received syntax elements and use an interpolation filter to generate the prediction blocks.

If the PU is encoded using inter prediction, motion compensation unit 82 may generate a candidate list for the PU. Data identifying the location of the selected candidate in the candidate list of the PU may be included in the code stream. After generating the candidate list for the PU, motion compensation unit 82 may generate a predictive image block for the PU based on one or more reference blocks indicated by the motion information of the PU. The reference block of the PU may be in a different time image than the PU. Motion compensation unit 82 may determine motion information for the PU based on the selected motion information from the candidate list of PUs.

Inverse quantization unit 86 inverse quantizes (eg, dequantizes) the quantized transform coefficients provided in the codestream and decoded by entropy encoding unit 80. The inverse quantization process may include determining the degree of quantization using the quantization parameters calculated by video encoder 20 for each of the video slices, and likewise determining the degree of inverse quantization that should be applied. Inverse transform unit 88 applies an inverse transform (eg, an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process) to the transform coefficients to produce a residual block in the pixel domain.

After motion compensation unit 82 generates a prediction block for the current video block based on the motion vector and other syntax elements, video decoder 30 sums the residual block from inverse transform unit 88 with the corresponding prediction block generated by motion compensation unit 82. A decoded video block is formed. Summer 90 (ie, the reconstructor) represents one or more components that perform this summation operation. A deblocking filter can also be applied to filter the decoded blocks to remove blockiness artifacts as needed. Other loop filters (either in the decoding loop or after the decoding loop) can also be used to smooth pixel transitions or otherwise improve video quality. The decoded video block in a given frame or image is then stored in a reference image memory 92, which stores a reference image for subsequent motion compensation. The reference image memory 92 also stores decoded video for later presentation on a display device such as display device 32 of FIG.

As noted above, the techniques of the present application illustratively relate to inter-frame decoding. It should be understood that the techniques of the present application can be performed by any of the video decoders described in this application, including, for example, video encoder 20 and video decoding as shown and described with respect to Figures 1 through 3 30. That is, in one possible implementation, the prediction unit 41 described with respect to FIG. 2 may perform the specific techniques described below when performing inter prediction during encoding of blocks of video data. In another possible implementation, the prediction unit 81 described with respect to FIG. 3 may perform the specific techniques described below when performing inter prediction during decoding of blocks of video data. Thus, references to a generic "video encoder" or "video decoder" may include video encoder 20, video decoder 30, or another video encoding or encoding unit.

It should be understood that other structural variations of video decoder 30 may be used to decode the encoded video bitstream. For example, for certain image blocks or image frames, entropy decoding unit 80 of video decoder 30 does not decode the quantized coefficients, and accordingly does not need to be processed by inverse quantization unit 86 and inverse transform unit 88.

4 is an exemplary flowchart of encoding motion information of a current image block (eg, a current PU or a current CU) by a video encoder (eg, video encoder 20) performing a merge operation 200 in an embodiment of the present application. In other feasible In an embodiment, the video encoder may perform a merge operation other than the merge operation 200. For example, in other possible implementations, the video encoder may perform a merge operation in which the video encoder performs more than 200 steps, or steps different from the merge operation 200, than the merge operation. In other possible implementations, the video encoder may perform the steps of the merge operation 200 in a different order or in parallel. The encoder may also perform a merge operation 200 on the PU encoded in a skip mode.

After the video encoder begins the merge operation 200, the video encoder may generate a candidate list for the current PU (202). The video encoder can generate a candidate list for the current PU in various ways. For example, the video encoder may generate a candidate list for the current PU according to one of the example techniques described below with respect to FIGS. 6A, 6B-10.

As described above, the candidate list for the current PU may include temporal candidate motion information (referred to as a temporal candidate). The temporal candidate motion information may indicate motion information of a time-domain co-located PU. The co-located PU may be spatially co-located with the current PU at the same location in the image frame, but in the reference image rather than the current image. The present application may refer to a reference image including a PU corresponding to a time domain as a related reference image. The present application may refer to a reference image index of an associated reference image as a related reference image index. As described above, the current image may be associated with one or more reference image lists (eg, list 0, list 1, etc.). The reference image index may indicate the reference image by indicating the position of the reference image in a certain reference image list. In some possible implementations, the current image can be associated with a combined reference image list.

In some video encoders, the associated reference image index is a reference image index of the PU that encompasses the reference index source location associated with the current PU. In these video encoders, the reference index source location associated with the current PU is adjacent to or adjacent to the current PU. In the present application, a PU may "cover" the particular location if the image block associated with the PU includes a particular location.

However, there may be an example where the reference index source location associated with the current PU is within the current CU. In these examples, if the PU is above or to the left of the current CU, the PU that covers the reference index source location associated with the current PU may be considered available. However, the video encoder may need to access motion information of another PU of the current CU in order to determine a reference image containing the co-located PU. Accordingly, these video encoders may use motion information (ie, reference image index) of PUs belonging to the current CU to generate temporal candidates for the current PU. In other words, these video encoders can generate temporal candidates using motion information for PUs belonging to the current CU. Accordingly, the video encoder cannot generate a candidate list for the current PU and the PU that covers the reference index source location associated with the current PU in parallel.

In accordance with the techniques of the present application, a video encoder can explicitly set an associated reference image index without reference to a reference image index of any other PU. This may enable the video encoder to generate candidate lists for other PUs of the current PU and the current CU in parallel. Because the video encoder explicitly sets the relevant reference picture index, the associated reference picture index is not based on motion information of any other PU of the current CU. In some possible implementations in which the video encoder explicitly sets the relevant reference image index, the video encoder may always set the relevant reference image index to a fixed predefined preset reference image index (eg, 0). In this way, the video encoder may generate a temporal candidate based on the motion information of the co-located PU in the reference frame indicated by the preset reference image index, and may include the temporal candidate in the candidate list of the current CU.

In a possible implementation where the video encoder explicitly sets the relevant reference picture index, the video encoder can be explicitly used in a syntax structure (eg, an image header, a stripe header, an APS, or another syntax structure) Signals the relevant reference image index. In this possible implementation, the video encoder can signal the decoder for each LCU (ie CTU), An associated reference image index of a CU, PU, TU, or other type of sub-block. For example, the video encoder may signal that the associated reference picture index for each PU of the CU is equal to "1."

In some possible implementations, the associated reference image index can be set implicitly rather than explicitly. In these possible embodiments, the video encoder may generate motion information for the PU of the current CU using the motion information of the PU in the reference image indicated by the reference image index of the PU covering the location outside the current CU. A time candidate, even if these locations are not strictly adjacent to the current PU.

After generating the candidate list for the current PU, the video encoder may generate a predictive image block (204) associated with the candidate in the candidate list. The video encoder may generate a prediction associated with the candidate by determining motion information of the current PU based on the motion information of the indicated candidate and then generating a predictive image block based on the one or more reference blocks indicated by the motion information of the current PU. Sexual image block. The video encoder may select one of the candidates from the candidate list (206). The video encoder can select candidates in a variety of ways. For example, the video encoder may select one of the candidates based on a rate-distortion cost analysis for each of the predictive image blocks associated with the candidate.

After selecting the candidate, the video encoder may output an index of the candidate (208). The index may indicate the location of the selected candidate in the candidate list. In some possible implementations, the index can be expressed as "merge_idx".

5 is an exemplary flow diagram of motion compensation performed by a video decoder (e.g., video decoder 30) in an embodiment of the present application.

When the video decoder performs motion compensation operation 220, the video decoder may receive an indication for the selected candidate for the current PU (222). For example, the video decoder may receive a candidate index indicating the location of the selected candidate within the current PU's candidate list.

If the motion information of the current PU is encoded using the synthetic merge mode and the current PU is bi-predicted, the video decoder may receive the first candidate index and the second candidate index. The first candidate index indicates the location of the selected candidate for the list 0 motion vector of the current PU in the candidate list. The second candidate index indicates the location of the selected candidate for the list 1 motion vector for the current PU in the candidate list. In some possible implementations, a single syntax element can be used to identify two candidate indices.

Additionally, the video decoder can generate a candidate list for the current PU (224). The video decoder can generate this candidate list for the current PU in various ways. For example, the video decoder may use the techniques described below with reference to Figures 6A, 6B-10 to generate a candidate list for the current PU. When the video decoder generates a temporal candidate for the candidate list, the video decoder may explicitly or implicitly set a reference image index identifying the reference image including the co-located PU, as previously described with respect to FIG. .

After generating the candidate list for the current PU, the video decoder may determine motion information for the current PU based on the motion information indicated by the one or more selected candidates in the candidate list for the current PU (225). For example, if the motion information of the current PU is encoded using the merge mode, the motion information of the current PU may be the same as the motion information indicated by the selected candidate. If the motion information of the current PU is encoded using the AMVP mode, the video decoder may reconstruct using one or more motion vectors indicated by the or the selected candidate and one or more MVDs indicated in the code stream One or more motion vectors of the current PU. The reference image index and the prediction direction identifier of the current PU may be the same as the reference image index and the prediction direction identifier of the one or more selected candidates. After determining the motion information for the current PU, the video decoder may generate a predictive image block for the current PU based on one or more reference blocks indicated by the motion information of the current PU (226).

It should be noted that, in an implementation manner, the video decoder generates a candidate list (224) for the current PU, and once the number of available candidates collected is determined by the received candidate index. When the number of candidates is the same, the process of collecting candidates can be ended.

6A is an exemplary schematic diagram of a coding unit (CU), a spatial neighboring image block associated therewith, and a time domain neighboring image block in the embodiment of the present application, illustrating that the CU 600 and the exemplary candidate location associated with the CU 600 are 1 Schematic diagram of 10. Candidate positions 1 through 5 represent spatial candidates in the same image as CU 600. Candidate position 1 is located to the left of CU600. Candidate position 2 is located above CU600. The candidate position 3 is positioned at the upper right of the CU600. The candidate position 4 is located at the lower left of the CU600. The candidate position 5 is positioned at the upper left of the CU600. Candidate locations 6 through 10 represent temporal candidates associated with co-located block 602 of CU 600, where the co-located block is of the same size, shape, and coordinates as CU 600 in the reference image (ie, adjacent to the encoded image) Image block. The candidate location 6 is located in the lower right corner of the co-located block 602. The candidate location 7 is located at the lower right middle of the co-located block 602. The candidate location 8 is located at the upper left corner of the co-located block 602. The candidate location 9 is located at the lower right corner of the co-located block 602. The candidate location 10 is located at the upper left middle position of the co-located block 602. FIG. 6A is an illustrative implementation to provide a candidate location for an inter prediction module (eg, motion estimation unit 42 or motion compensation unit 82 in particular) to generate a candidate list.

It should be noted that the spatial candidate location and the temporal candidate location in FIG. 6A are merely illustrative, and the candidate location includes but is not limited thereto. In some feasible implementations, the spatial candidate location may also optionally include a location within a preset distance from the image block to be processed, but not adjacent to the image block to be processed. Illustratively, this type of location can be as shown by 6 to 27 in Figure 6B. It should be understood that FIG. 6B is an exemplary schematic diagram of a coding unit and a spatial neighboring image block associated therewith in the embodiment of the present application. The position of the image block not adjacent to the image block to be processed that has been reconstructed when the image block to be processed is in the same image frame as the image block to be processed also belongs to the range of the spatial candidate position. This type of location is referred to herein as a spatial non-contiguous image block, it being understood that the spatial candidate may be taken from one or more locations as shown in Figure 6B.

FIG. 7 is a schematic flowchart showing an acquisition process 700 of candidate motion information of an image block according to an embodiment of the present application. Process 700 may be performed by video encoder 20 or video decoder 30, and in particular, may be performed by an inter prediction unit of video encoder 20 or an inter prediction unit of video decoder 30. In video encoder 20, the inter prediction unit is illustrative and may include motion estimation unit 42 and motion compensation unit 44. In video decoder 30, the inter prediction unit is illustrative and may include motion compensation unit 82. The inter prediction unit may generate a candidate motion information list for the PU. The candidate motion information list may include one or more original candidate motion information and one or more additional candidate motion information derived from the original candidate motion information. In other words, process 700 can include acquisition process 710 of original candidate motion information and acquisition process 730 of additional/additional candidate motion information, which is described as a series of steps or operations, it being understood that process 700 can be in various orders Execution and/or simultaneous occurrence are not limited to the execution sequence shown in FIG. Assuming that a video data stream having multiple video frames is using a video encoder or video decoder, a process 700 comprising the steps of predicting candidate motion information for a current image block of a current video frame is performed;

Step 711: Detect one or more spatial reference blocks of the current image block according to the first preset sequence, and obtain M sets of original candidate motion information in the candidate list of the to-be-processed image block (or obtain for constructing The M sets of original candidate motion information of the candidate list of the image block to be processed, M is an integer greater than or equal to 0;

It should be understood that the detection herein may include an "available" inspection process as referred to elsewhere herein, or the detection herein may include "available" inspections as described elsewhere herein, as well as trimming (eg, de-redundancy) processes. ,No longer.

Referring to FIGS. 6A and 6B, one or more spatial reference blocks of the current image block include: one or more spatial reference blocks in the image in which the current image block is located adjacent to the current image block, and/or the current image One or more spatial reference blocks in the image in which the block is located that are not adjacent to the image block to be processed. As shown in FIG. 6A, the one or more spatial reference blocks adjacent to the current image block in the image of the current image block may include: a fourth airspace adjacent block A0 located at the lower left side of the current image block, located at a first airspace neighboring block A1 on the left side of the current image block, a third airspace neighboring block B0 located on the upper right side of the current image block, a second airspace adjacent block B1 on the upper side of the current image block, or located in the The fifth airspace on the upper left side of the current image block is adjacent to the block B2. As shown in FIG. 6B, the one or more spatial reference blocks that are not adjacent to the image block to be processed in the image of the current image block may include: a first spatial non-contiguous image block and a second spatial non-contiguous image block. And a third airspace non-contiguous image block or the like.

In an implementation manner, in step 711, the first airspace neighboring block A1, the second airspace neighboring block B1, the third airspace neighboring block B0, the fourth airspace neighboring block A0, and the fifth airspace neighboring block are sequentially detected. Whether B2 is available to obtain M1 determined motion vector images in the first airspace neighboring block A1, the second airspace neighboring block B1, the third airspace neighboring block B0, the fourth airspace neighboring block A0, and the fifth airspace neighboring block B2 The motion information of the block, M1 is an integer greater than or equal to 0,

In another implementation manner, in step 711, the first spatial non-contiguous image block, the second spatial non-contiguous image block, and the third spatial non-contiguous image block may be used. The physical meaning of "available" may refer to the foregoing. The description will not be repeated. It is possible to set the motion vector of the first spatial neighboring block A1, the motion vector of the second spatial neighboring block B1, the motion vector of the third spatial neighboring block B0, the motion vector of the fourth spatial neighboring block A0, the motion vector obtained by the ATMVP technique, The motion vector of the fifth spatial neighboring block B2 and the motion vector obtained by the STMVP technique are respectively MVL, MVU, MVUR, MVDL, MVA, MVUL, MVS, and the first spatial non-contiguous image block, the second spatial non-contiguous image block, and The motion vectors of the third spatial non-contiguous image block are respectively MV0, MV1, MV2, and then may be checked in the following order to obtain M candidates (ie, M candidate motion vectors) used in the construction candidate list:

Example 1: MVL, MVU, MVUR, MVDL, MV0, MV1, MV2, MVA, MVUL, MVS;

Example 2: MVL, MVU, MVUR, MVDL, MVA, MV0, MV1, MV2, MVUL, MVS;

Example 3: MVL, MVU, MVUR, MVDL, MVA, MVUL, MV0, MV1, MV2, MVS;

Example 4: MVL, MVU, MVUR, MVDL, MVA, MVUL, MVS, MV0, MV1, MV2;

Example 5: MVL, MVU, MVUR, MVDL, MVA, MV0, MVUL, MV1, MVS, MV2;

Example 6: MVL, MVU, MVUR, MVDL, MVA, MV0, MVUL, MV1, MV2, MVS;

Example 7: MVL, MVU, MVUR, MVDL, MVA, MVUL, MV0, MV1, MV2, MVS;

It should be understood that Examples 1 through 7 exemplarily show several possible M original candidate motion vectors for constructing a candidate list. Based on the motion vector of the spatial non-contiguous image block, there may be other ways of composing the candidate list and the arrangement of the candidates in the list, which are not limited.

It should be understood that the motion vectors (for example, MV0, MV1, and MV2) of different spatial non-contiguous image blocks may also have different arrangements, which is not limited in this embodiment of the present application.

Compared with the spatial candidate location described using only FIG. 6A, the motion vector of the spatial non-contiguous image block is simultaneously used as the spatial candidate in the candidate list of the to-be-processed block, and more spatial a priori coding information is utilized to improve The coding performance.

Step 713: Detect one or more time domain reference blocks of the current image block according to a second preset sequence, and obtain L sets of original candidate motion information in the candidate list of the to-be-processed image block (or obtain L sets of original candidate motion information for constructing a candidate list of the image block to be processed, L is an integer greater than or equal to 0;

Referring to FIG. 6A, one or more time domain reference blocks of a current image block may be understood as an image block in a co-located block of a current image block or a spatial neighboring block of a co-located block of a current image block, and may include, for example: The lower right spatial domain of the co-located block of the current image block is adjacent to the block H, the upper left intermediate block C0 of the co-located block, the lower right intermediate block C3 of the co-located block, the same position An upper left block TL of the block, or a lower right block BR of the same location block, wherein the co-located block is an image block of the reference image having the same size, shape, and coordinates as the current image block.

In an implementation manner, in step 713, the right lower airspace neighboring block H of the co-located block and the lower right intermediate block C3 of the co-located block are sequentially detected to obtain L1 determined motion vector images. Motion information of the block; or

It should be understood that only a few possible sets of L original candidate motion information for constructing a candidate list are given exemplarily herein. There may be other ways of composing the candidate list and the arrangement of the candidates in the list, which are not limited.

It should be understood that the motion information of the different time domain reference blocks may also have different arrangement manners, which is not limited by the embodiment of the present application.

Further, in a specific implementation manner, the detecting condition of the other time domain reference block that is not adjacent to the block H of the lower right spatial domain of the co-located block may include: the right lower airspace neighboring block H of the co-located block is not available. Or, the number of candidate motion information in the candidate list is less than the target number.

Step 731: When the number of candidate motion information in the candidate list of the image block to be processed is smaller than the target number, the original candidate motion information (also referred to as bidirectional prediction) of at least one set of bidirectional prediction types included in the candidate list. Performing decomposition processing on the original candidate motion information of the encoding/decoding mode to obtain Q in the candidate list of the image block to be processed A newly constructed unidirectional prediction type candidate motion information (also referred to as candidate motion information of a unidirectional prediction encoding/decoding mode), and Q is an integer greater than or equal to zero.

The original candidate motion information of a set of bidirectional prediction types may include: motion information for a forward prediction direction and motion information for a backward prediction direction, where the motion information for the forward prediction direction includes the first a reference image list and a first reference image index corresponding to the first reference image list and a motion vector of the first reference image corresponding to the first reference image index; the motion information for the backward prediction direction includes the second a reference image list and a second reference image index corresponding to the second reference image list and a motion vector of the second reference image corresponding to the second reference image index;

After the decomposition processing, the Q-group newly constructed unidirectional prediction type candidate motion information may include: a unidirectional prediction type is a set of motion information in a forward prediction direction and/or a unidirectional prediction type is a backward prediction direction. Group motion information, wherein the set of motion information of the forward prediction direction includes a first reference image list and a first reference image index corresponding to the first reference image list and a first reference corresponding to the first reference image index a motion vector of the image; the set of motion information of the backward prediction direction includes a second reference image list and a second reference image index corresponding to the second reference image list and a second reference corresponding to the second reference image index The motion vector of the image. It should be understood that if the newly constructed candidate motion information is repeated with existing candidates in the candidate list, the newly constructed candidate motion information does not need to be added to the candidate list.

In the embodiment of the present invention, the method further includes:

Step 733, when the number of candidate motion information in the candidate list of the image block to be processed is smaller than the target number, the original of the two sets of unidirectional prediction types (unidirectional prediction encoding/decoding mode) included in the candidate list The candidate motion information is combined to obtain candidate motion information (candidate motion information of the bidirectional prediction encoding/decoding mode) of the P group newly constructed in the candidate list of the to-be-processed image block, where P is greater than or equal to 0. The integer.

The combination refers to motion information using a forward prediction encoding/decoding mode (ie, a set of unidirectional prediction type original candidate motion information) and motion information using a backward prediction encoding/decoding mode (ie, another set of unidirectional predictions) The original candidate motion information of the type is combined to obtain motion information using a bidirectional predictive encoding/decoding mode (ie, a set of newly constructed bidirectional prediction type candidate motion information). For example, a motion information using a forward predictive encoding/decoding mode includes a reference image set of list0, a reference index of 1 reference image, and a motion vector of (-3, -5). A motion information using a backward predictive encoding/decoding mode includes a reference image set of list1, a reference index of reference index 0, and a motion vector of (3, 5). Correspondingly, the motion information of the combined bidirectional prediction encoding/decoding mode includes: the forward prediction motion information is a reference image set as list0, the reference index is 1 reference image, and the motion vector is (-3, -5); The predicted motion information is a reference image set as list1, a reference index is 0, and the motion vector is (3, 5).

The decomposition is a combined inverse process, which refers to splitting motion information using a bidirectional predictive coding/code mode (ie, a set of bidirectional prediction type original candidate motion information) into motion information using a backward predictive encoding/decoding mode ( That is, a set of newly constructed unidirectional prediction type candidate motion information) and a motion information using a forward prediction encoding/decoding mode (ie, another set of newly constructed unidirectional prediction type candidate motion information). For example, the motion information in the bidirectional predictive encoding/decoding mode includes: the forward predicted motion information is a reference image set as list0, the reference index is 1 reference image, the motion vector is (-3, -5), and the backward predictive motion is The information is reference image set as list1, reference index is 0 reference image, motion vector For (3,5). After decomposing, motion information using a forward prediction encoding/decoding mode can be respectively obtained, wherein the motion information is a reference image set list0, a reference index is 1 reference image, a motion vector is (-3, -5); The motion information of the backward prediction encoding/decoding mode is adopted, wherein the motion information is a reference image set as list1, a reference index is 0, and the motion vector is (3, 5).

In some possible implementation manners, the embodiment of the present invention may further include:

Step 735, when the number of candidate motion information in the candidate list of the image block to be processed is smaller than the target number, for example, if the additional candidate generated by the foregoing manner is still insufficient, the video encoder or the video decoder may also insert zero. Motion vectors are used as candidate motion information to generate additional or additional candidates. These additional or additional candidate motion information are not considered raw candidate motion information and may be referred to as late or artificially generated candidate motion information in this application.

It can be seen that in this embodiment, not only more original candidate motion information can be extended (for example, a motion vector using a spatial non-contiguous image block is used as candidate motion information in a candidate list of a to-be-processed block), but also more extras can be extended. Candidate motion information (eg, candidate motion information of a unidirectional prediction type generated by decomposition, candidate motion information of a bidirectional prediction type combined) to obtain more available candidate motion information for constructing a candidate list, thereby On the basis of maximally avoiding or reducing the artificially added zero vector, the number of candidates in the candidate list can satisfy the target number (for example, the preset maximum number of candidate motion information in the candidate list, or can be obtained by parsing from the code stream) The index identifies the number of candidate motion information determined. The method for acquiring the candidate motion information in this embodiment can be applied to the inter-frame codec process of the video image codec, thereby improving the coding performance.

FIG. 8 is another exemplary flowchart of a method for acquiring candidate motion information of an image block in an embodiment of the present application. Process 800 may be performed by a video encoding end (e.g., video encoder 20) or a video decoding end (e.g., video decoder 30).

The schematic process of the video encoding end acquiring candidate motion information to construct a candidate list is as follows:

Steps 801 to 805, in the case of the merge mode, in the process of collecting candidate motion information input, performing motion information detection on neighboring neighboring blocks in the current coded block airspace, and if available, as candidate motion information;

Step 807: When the number of available candidate motion information does not reach the maximum value of the candidate motion information, whether the motion information on the time domain reference block is detected may be used as candidate motion information.

Step 809, when it can be used as candidate motion information, determine whether the number of available candidate motion information has reached the maximum value of the preset candidate motion information;

Step 811: When the number of available candidate motion information does not reach the maximum value of the candidate motion information, the existing candidate motion information is used to construct the bidirectional prediction motion information, and whether the newly constructed bidirectional motion information can be used as the candidate motion information;

Steps 813 to 817, when the number of available candidate motion information does not reach the maximum value of the candidate motion information, constructing the unidirectional prediction motion information by using the candidate motion information that has been bidirectionally predicted, and determining whether the newly constructed unidirectional prediction motion information is Can be used as candidate motion information;

Step 825, when the number of available candidate motion information reaches the maximum value of the candidate motion information, the process of collecting the candidate motion information is ended.

Step 823, when the number of available candidate motion information does not reach the maximum value of the candidate motion information, the process of collecting the candidate motion information is continued.

The schematic process of the video decoder acquiring candidate motion information to construct a candidate list is as follows:

Steps 801 to 805, in the case of the merge mode, in the process of collecting candidate motion information input, detecting motion information of neighboring neighboring blocks in the current coded block airspace;

If available, as candidate motion information, compare the number of available candidate motion information with the target number determined by the index value received by the video decoder;

If the number of available candidate motion information is consistent with the target number determined by the index value received by the video decoder, the motion information of the neighboring block in the current decoded block spatial domain is used as the best candidate motion information (ie, for the current image block to be decoded) The selected candidate motion information, also referred to as target candidate motion information), performing step 825;

If the number of available candidate motion information is different from the target number determined by the video decoder, the step 807 is performed;

Steps 807 to 809: detecting whether motion information on the time domain reference block can be used as candidate motion information; if available, as candidate motion information, determining the number of available candidate motion information and the index value received by the video decoder The number of targets is compared;

If the number of available candidate motion information is consistent with the target number determined by the index value received by the video decoder, determining that the motion information on the current time domain reference block is the best candidate motion information, that is, determining the current time domain reference block. The motion information is motion information of the image block to be decoded or motion information of the image block to be decoded by using motion information on the current time domain reference block, and the process of collecting the candidate motion information is ended;

If the number of available candidate motion information is different from the target number determined by the video decoder, continue to step 811;

Steps 811 to 815, combining the existing candidate motion information to construct bidirectional prediction motion information, and determining whether the newly constructed bidirectional prediction motion information is available as candidate motion information;

If the number of available candidate motion information is consistent with the target number determined by the index value received by the video decoder, determining that the currently constructed bidirectional prediction motion information is the best candidate motion information, that is, determining the currently constructed bidirectional prediction motion information is to be determined. Decoding the motion information of the image block or determining the motion information of the image block to be decoded by using the currently constructed bidirectional prediction motion information, and ending the process of collecting the candidate motion information;

If the number of available candidate motion information and the index value are different from the target value determined by the video decoder, continue to step 817;

Steps 817 to 821, constructing unidirectional prediction motion information by using the existing bidirectionally predicted candidate motion information, and determining whether the newly constructed unidirectional prediction motion information is available as candidate motion information;

If the newly constructed unidirectional prediction motion information can be used as candidate motion information, it is determined whether the number of available candidate motion information is consistent with the target number determined by the index value received by the video decoder;

If the number of available candidate motion information is consistent with the target number determined by the index value received by the video decoder, determining the newly constructed unidirectional prediction motion information as the best candidate motion information, that is, determining the newly constructed unidirectional prediction motion information Determining motion information of the image block to be decoded for motion information of the image block to be decoded or using newly constructed unidirectional prediction motion information, the process ends;

If the number of available candidate motion information is different from the target number determined by the video decoder, the step 823 is performed;

Step 823, continuing to perform the process of collecting candidate motion information;

Step 825, ending the process of collecting candidate motion information.

An example of the encoding mode of the index and an example table of the correspondence between the index value and the encoding mode are given below, and the index corresponding to the candidate in the composite candidate list adopts a non-fixed length encoding method, and the examples are as follows:

候选者列表中的索引位置Index position in the candidate list	merge_idx索引编码Merge_idx index encoding
00	11
11	0101
22	001001
33	00010001
44	0000100001
55	000001000001

For example, if the candidate index received by the decoding end is "1", it indicates that the selected candidate motion information for the current image block to be decoded is a candidate on the index position 0 in the composite candidate list, and correspondingly, the candidate motion The target number of information is =1; if the candidate index received by the decoding end is "000001", it indicates that the selected candidate motion information for the current image block to be decoded is a candidate on the index position 5 in the composite candidate list, Correspondingly, the target number of candidate motion information = 5.

As an example, it is assumed that the index value obtained by decoding at the decoding end is 4, and in the process of acquiring candidate motion information in the synthesized Merge mode, the right lower airspace of the co-located block of the currently decoded block is adjacent to the block H of the right-hand domain. The motion information may be used as the candidate motion information. In this case, the number of available candidate motion information is 3, and the number of targets derived from the index value is different, and the motion information of the intermediate block C0 or C3 of the same location block may not be determined as available. For candidate motion information, the number of candidate motion information available at this time is 3, which is different from the number of targets derived from the index value. If the motion information of the unidirectional prediction type constructed by the original candidate motion information of the existing bidirectional prediction type can be used as the available candidate motion information, the number of available candidate motion information at this time is 4, which is the same as the number of targets derived from the index value. . The motion information of the constructed unidirectional prediction type is selected as the best candidate motion information (ie, the selected candidate motion information for the current image block to be decoded), and the process of acquiring the candidate motion information is ended.

It can be seen that the number of candidate motion information in the candidate list to be constructed is obtained according to the index value. When the number of candidate motion information acquired is sufficient to determine the target candidate motion information by using the index value, that is, the target candidate motion information in the candidate list. And after the candidate motion information ranked before the target candidate motion information in the candidate list has been constructed, the other candidate motion information in the candidate list is stopped.

It should be understood that there are two ways for the decoder to construct the candidate list, one is to detect one side matching as described above, and the other is that after the candidate list is all constructed, it is matched with the index value to determine which candidate to select. The motion information, that is, the candidate motion vector at the position indicated by the index is found from the established candidate list.

It should be understood that the foregoing candidate list may be used in the Merge mode described above, or in other prediction modes for acquiring a predicted motion vector of a to-be-processed image block, and may be used in the encoding end, or may be consistent with the corresponding encoding end. For the decoding end, for example, the number of candidates in the candidate list is also the preset maximum number, and is consistent at the codec end, and the specific number is not limited. In this case, the operation of the decoding end refers to the decoding end, here No longer.

FIG. 9 is an exemplary schematic diagram of adding a decomposed candidate motion vector to a merge mode candidate list in the embodiment of the present application. A merging prediction type of merging candidate is generated by decomposing the original merging candidate of the bidirectional prediction type. In particular, one of the original candidates of the bi-prediction type (which has mvL0 and refIdxL0, and mvL1 and refIdxL1) can be used to generate two unidirectional predictive merge candidates. In FIG. 9, a raw merge candidate of one bidirectional prediction type (having mvL0_A and ref0 in list 0 and mvL1_B and ref0 in list 1) is included in the original merge candidate list at index position 0. After decomposition, a newly constructed unidirectional prediction type candidate, ie, the prediction type is list 0 unidirectional prediction, and mvL0_A and ref0 are picked up from list 0. Another newly constructed unidirectional prediction type candidate, the prediction type is list 1 unidirectional prediction, and mvL1_B and ref0 are picked up from list 1. It is checked whether the newly constructed merge candidate is different from the candidate already included in the merge candidate list. If they are different, the video decoder or video encoder includes the newly constructed unidirectional prediction type of merge candidate in the merge candidate list.

FIG. 10 is an exemplary schematic diagram of adding a combined candidate motion vector to a merge mode candidate list in the embodiment of the present application. The combined bi-predictive merge candidate can be generated by combining the original merge candidates. In particular, two of the original candidates (which have mvL0 and refIdxL0 or mvL1 and refIdxL1) may be used to generate bi-predictive merge candidates. In Figure 10, two candidates are included in the original merge candidate list. The prediction type of one candidate is list 0 unidirectional prediction, and the prediction type of another candidate is list 1 unidirectional prediction. In this possible implementation, mvL0_A and ref0 are picked up from list 0, and mvL1_B and ref0 are picked up from list 1, and then bidirectional predictive merge candidates (which have mvL0_A and ref0 in list 0 and list 1) mvL1_B and ref0) and check if it is different from the candidates already included in the candidate list. If they are different, the video decoder may include bi-predictive merge candidates in the candidate list.

In one possible implementation, if the newly generated candidate is different from the candidate already included in the candidate list, the generated candidate is added to the merge candidate list. The process of determining whether a candidate is different from a candidate already included in the candidate list is sometimes referred to as pruning. By cropping, each newly generated candidate can be compared to an existing candidate in the list. In some possible implementations, the pruning operation may include comparing one or more new candidates to candidates that are already in the candidate list and new candidates that are not added as duplicates of candidates already in the candidate list. In other possible implementations, the pruning operation can include adding one or more new candidates to the candidate list and later removing the duplicate candidates from the list.

FIG. 11 is a schematic block diagram of an apparatus 1100 for acquiring candidate motion information of an image block in an embodiment of the present application. The candidate motion information is used to construct a candidate list for inter prediction, and the apparatus 1100 for acquiring candidate motion information of the image block includes:

The airspace candidate motion information acquiring module 1101 is configured to detect one or more spatial reference blocks of the current image block according to the first preset sequence, and obtain M sets of original candidate motion information in the candidate list of the to-be-processed image block. , M is an integer greater than or equal to 0;

The time domain candidate motion information acquiring module 1102 is configured to detect one or more time domain reference blocks of the current image block according to a second preset sequence, to obtain an L group in the candidate list of the to-be-processed image block. Raw candidate motion information, L is an integer greater than or equal to 0;

An additional candidate motion information acquiring module 1103, configured to: when the number of candidate motion information in the candidate list of the image block to be processed is smaller than a target number, the original candidate motion of at least one set of bidirectional prediction types included in the candidate list The information is subjected to decomposition processing to obtain candidate motion information of the unidirectional prediction type newly constructed by the Q group in the candidate list of the image block to be processed, and Q is an integer greater than or equal to 0.

In a feasible implementation manner, the original candidate motion information of the set of bidirectional prediction types includes: motion information for a forward prediction direction and motion information for a backward prediction direction, where the The motion information of the prediction direction includes a first reference image list and a first reference image index corresponding to the first reference image list and a motion vector of the first reference image corresponding to the first reference image index; The motion information of the prediction direction includes a second reference image list and a second reference image index corresponding to the second reference image list and a motion vector of the second reference image corresponding to the second reference image index;

After the additional candidate motion information acquisition module is decomposed, the candidate motion information of the unidirectional prediction type newly constructed by the Q group includes: a group of motion information of a unidirectional prediction type being a forward prediction encoding/decoding mode and/or The unidirectional prediction type is a set of motion information of a backward prediction encoding/decoding mode, wherein the set of motion information of the forward prediction encoding/decoding mode includes a first reference image list and a first corresponding to the first reference image list a reference image index and a motion vector of the first reference image corresponding to the first reference image index; the set of motion information of the backward predictive encoding/decoding mode includes a second reference image list and corresponding to the second reference image list And a second reference image index and a motion vector of the second reference image corresponding to the second reference image index.

In a possible implementation manner, when the number of candidate motion information in the candidate list of the to-be-processed image block is smaller than the target number, the additional candidate motion information acquiring module is further configured to: include in the candidate list The original candidate motion information of the two sets of unidirectional prediction types is combined to obtain candidate motion information of the P-group newly constructed bidirectional prediction type in the candidate list of the image block to be processed, and P is an integer greater than or equal to 0.

In a possible implementation, the one or more spatial reference blocks include: one or more spatial reference blocks in the image in which the current image block is located adjacent to the current image block, and/or the current image One or more spatial reference blocks in the image in which the block is located that are not adjacent to the image block to be processed.

In a possible implementation, the one or more spatial reference blocks in the image in which the current image block is located adjacent to the current image block include:

In a possible implementation manner, the airspace candidate motion information acquiring module is configured to:

In a possible implementation manner, the one or more time domain reference blocks include: a lower right spatial domain adjacent block H of a co-located block of the current image block, where the same location block An upper left intermediate block C0, a lower right intermediate block C3 of the same location block, an upper left block TL of the same location block, or a lower right block BR of the same location block, wherein the same location block is in a reference image An image block having the same size, shape, and coordinates as the current image block.

In a possible implementation manner, the time domain candidate motion information acquiring module is configured to:

In a possible implementation, the apparatus 1100 is configured to encode or decode a video image, where the target number is a preset maximum number of candidate motion information in a candidate list of the current image block; or, the device 1100 For decoding a video image, the target number is the number of candidate motion information determined using an index identifier parsed from the code stream.

It can be seen that in this embodiment, not only more original candidate motion information can be extended (for example, a motion vector using a spatial non-contiguous image block is used as candidate motion information in a candidate list of a to-be-processed block), but also more extras can be extended. Candidate motion information (eg, candidate motion information of a unidirectional prediction type generated by decomposition, candidate motion information of a bidirectional prediction type combined) such that the candidate list includes more candidate motion information, thereby making a candidate list The number of candidates satisfies the target number (for example, the preset maximum number of candidate motion information in the candidate list, or the number of candidate motion information determined using the index identification parsed from the code stream), which improves encoding performance.

It should be understood that, in the embodiment of the present invention, the motion vector image block is determined to be an image block whose motion vector has been determined when predicting an image block to be processed, and may be an image block that has been reconstructed or an image block that has not been reconstructed. , no restrictions.

FIG. 12 is a schematic block diagram of an implementation manner of an encoding device or a decoding device (referred to as decoding device 1200 for short) in an embodiment of the present application. The decoding device 1200 can include a processor 1210, a memory 1230, and a bus system 1250. Wherein the processor and the memory are connected by a bus system, the memory is used for storing instructions, and the processor is used for Execute the instructions stored in this memory. The memory of the encoding device stores the program code, and the processor can invoke the program code stored in the memory to perform various video encoding or decoding methods described herein, particularly video encoding or decoding methods in various new inter prediction modes. And methods of predicting motion information in various new inter prediction modes. To avoid repetition, it will not be described in detail here.

In the embodiment of the present application, the processor 1210 may be a central processing unit ("CPU"), and the processor 1210 may also be other general-purpose processors, digital signal processors (DSPs), and dedicated integration. Circuit (ASIC), off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, etc. The general purpose processor may be a microprocessor or the processor or any conventional processor or the like.

The memory 1230 can include a read only memory (ROM) device or a random access memory (RAM) device. Any other suitable type of storage device can also be used as the memory 1230. Memory 1230 can include code and data 1231 that are accessed by processor 1210 using bus 1250. The memory 1230 can further include an operating system 1233 and an application 1235 that includes a video encoding or decoding method (especially an acquisition method of candidate motion information for an image block described herein) that allows the processor 1210 to perform the methods described herein. At least one program. For example, application 1235 can include applications 1 through N, which further include a video encoding or decoding application (referred to as a video coding application) that performs the video encoding or decoding methods described herein.

The bus system 1250 may include a power bus, a control bus, a status signal bus, and the like in addition to the data bus. However, for clarity of description, various buses are labeled as bus system 1250 in the figure.

Alternatively, decoding device 1200 may also include one or more output devices, such as display 1270. In one example, display 1270 can be a tactile display that combines the display with a tactile unit that operatively senses a touch input. Display 1270 can be coupled to processor 1210 via bus 1250.

Although specific aspects of the present application have been described in relation to video encoder 20 and video decoder 30, it should be understood that the techniques of the present application may be through many other video encoding and/or encoding units, processors, processing units, such as encoder/decode The hardware-based coding unit of the (CODEC) and the like are applied. Moreover, it should be understood that the steps shown and described with respect to FIG. 5 are provided only as a possible implementation. That is, the steps shown in the possible embodiments of FIG. 5 need not necessarily be performed in the order shown in FIG. 5, and fewer, additional, or alternative steps may be performed.

In addition, it is to be understood that the specific actions or events of any of the methods described herein may be performed in different sequences depending on the possible embodiments, and may be added, combined, or omitted together (eg, not all described) The action or event is necessary for the practice method). Moreover, in a particular possible implementation, an action or event can be performed concurrently, rather than sequentially, via, for example, multi-threaded processing, interrupt processing, or multiple processors. In addition, while the specific aspects of the present application are described as being performed by a single module or unit for purposes of clarity, it should be understood that the techniques of the present application can be implemented by a combination of units or modules associated with a video decoder.

In one or more possible implementations, the functions described can be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted as one or more instructions or code via a computer readable medium and executed by a hardware-based processing unit. The computer readable medium can comprise a computer readable storage medium or communication medium, the computer readable storage medium corresponding to a tangible medium such as a data storage medium, the communication medium comprising facilitating transmission of the computer program, for example, from one location to another in accordance with a communication protocol Any media.

In this manner, computer readable media may illustratively correspond to (1) a non-transitory tangible computer readable storage medium, or (2) a communication medium such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this application. The computer program product can comprise a computer readable medium.

As a possible implementation and not limitation, the computer readable storage medium may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage device, magnetic disk storage device or other magnetic storage device, flash memory or may be used to store instructions. Or any other medium in the form of a data structure and accessible by a computer. Also, any connection is properly termed a computer-readable medium. For example, if you use coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technology such as infrared, radio, and microwave to transmit commands from a website, server, or other remote source, then coaxial Cables, fiber optic cables, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of the media.

However, it should be understood that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but instead are directed to non-transitory tangible storage media. As used herein, magnetic disks and optical disks include compact discs (CDs), laser discs, optical discs, digital versatile discs (DVDs), flexible disks, and Blu-ray discs, in which disks typically reproduce data magnetically, while discs pass through thunder. The projection optically reproduces the data. Combinations of the above should also be included in the scope of computer readable media.

One or more processes may be performed by, for example, one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), or other equivalent integrated or discrete logic circuits To execute instructions. Accordingly, the term "processor," as used herein, may refer to any of the foregoing structures or any other structure suitable for implementing the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques can be fully implemented in one or more circuits or logic elements.

The techniques of the present application can be implemented in a wide variety of devices or devices, including wireless handsets, integrated circuits (ICs), or a collection of ICs (eg, a chipset). Various components, modules or units are described herein to emphasize functional aspects of the apparatus configured to perform the disclosed techniques, but do not necessarily need to be implemented by different hardware units. Rather, as described above, various units may be combined in a codec hardware unit or combined with suitable software and/or firmware by interoperable hardware units (including one or more processors as described above). The collection comes to offer.

The foregoing is only an exemplary embodiment of the present application, but the scope of protection of the present application is not limited thereto, and any person skilled in the art can easily think of changes or within the technical scope disclosed by the present application. Replacement should be covered by the scope of this application. Therefore, the scope of protection of the present application should be determined by the scope of protection of the claims.

Claims

A method for acquiring candidate motion information of an image block, wherein the candidate motion information is used to construct a candidate list for inter prediction, the method comprising:

Detecting one or more spatial reference blocks of the current image block according to the first preset sequence, to obtain M sets of original candidate motion information in the candidate list of the current image block, where M is an integer greater than or equal to 0;

Detecting one or more time domain reference blocks of the current image block according to a second preset sequence, and obtaining L sets of original candidate motion information in the candidate list of the image block to be processed, where L is greater than or equal to 0. Integer

When the number of candidate motion information in the candidate list of the image block to be processed is smaller than the target number, the original candidate motion information of at least one set of bidirectional prediction types included in the candidate list is decomposed to obtain the to-be-processed Q group of newly constructed unidirectional prediction type candidate motion information in the candidate list of image blocks, Q is an integer greater than or equal to 0.
The method according to claim 1, wherein the set of bidirectional prediction type original candidate motion information comprises: motion information for a forward prediction direction and motion information for a backward prediction direction, wherein The motion information for the forward prediction direction includes a first reference image list and a first reference image index corresponding to the first reference image list and a motion vector of the first reference image corresponding to the first reference image index; The motion information for the backward prediction direction includes a second reference image list and a second reference image index corresponding to the second reference image list and a motion vector of the second reference image corresponding to the second reference image index;

After the decomposition processing, the candidate motion information of the unidirectional prediction type of the Q group newly constructed includes: a group of motion information whose unidirectional prediction type is a forward prediction direction and/or a group whose unidirectional prediction type is a backward prediction direction. Motion information, wherein the set of motion information of the forward prediction direction includes a first reference image list and a first reference image index corresponding to the first reference image list and a first reference image corresponding to the first reference image index a motion vector; the set of motion information of the backward prediction direction includes a second reference image list and a second reference image index corresponding to the second reference image list and a second reference image corresponding to the second reference image index Sport vector.
The method according to claim 1 or 2, wherein the method further comprises:

When the number of candidate motion information in the candidate list of the image block to be processed is smaller than the target number, the original candidate motion information of the two sets of unidirectional prediction types included in the candidate list is combined to obtain the to-be-processed P-group newly constructed bidirectional prediction type candidate motion information in the candidate list of image blocks, P being an integer greater than or equal to zero.
The method according to any one of claims 1 to 3, wherein the one or more spatial reference blocks comprise: one or more airspaces adjacent to the current image block in an image of a current image block a reference block, and/or one or more spatial reference blocks in the image in which the current image block is located that are not contiguous with the image block to be processed.
The method according to claim 4, wherein the one or more spatial reference blocks adjacent to the current image block in the image of the current image block comprise:

a fourth airspace neighboring block A0 located at a lower left side of the current image block, a first airspace neighboring block A1 located at a left side of the current image block, and a third airspace neighboring block B0 located at an upper right side of the current image block. The second spatial domain adjacent block B1 on the upper side of the current image block, or the fifth airspace adjacent block B2 located on the upper left side of the current image block.
The method according to claim 5, wherein the detecting one or more spatial reference blocks of the current image block according to the first preset order, obtaining the M in the candidate list of the image block to be processed The group original candidate motion information includes:

Detecting whether the first airspace neighboring block A1, the second airspace neighboring block B1, the third airspace neighboring block B0, the fourth airspace neighboring block A0, and the fifth airspace neighboring block B2 are available to obtain the first airspace neighboring The motion information of the M1 determined motion vector image blocks in the block A1, the second spatial neighboring block B1, the third spatial neighboring block B0, the fourth spatial neighboring block A0, and the fifth spatial neighboring block B2, where M1 is greater than or equal to 0. Integer,

Adding M sets of motion information in the motion information of the detected M1 determined motion vector image blocks as candidate motion information to the candidate list, where M1 is equal to or greater than M;

Wherein: the detection condition of the fifth airspace neighboring block B2 includes: when any one of the first airspace neighboring block A1, the second airspace neighboring block B1, the third airspace neighboring block B0, and the fourth airspace neighboring block A0 is unavailable, The fifth airspace neighboring block B2 is detected.
The method according to any one of claims 1 to 3, wherein the one or more time domain reference blocks comprise: a lower right spatial domain adjacent block H of a co-located block of the current image block, the same An upper left intermediate block C0 of the location block, a lower right intermediate block C3 of the same location block, an upper left block TL of the same location block, or a lower right block BR of the same location block, wherein the same location block is An image block of the reference image having the same size, shape, and coordinates as the current image block.
The method according to claim 7, wherein the detecting one or more time domain reference blocks of the current image block according to a second preset order, obtaining a candidate list of the image block to be processed The original candidate motion information of the L group in the group includes:

Detecting, in sequence, the right lower spatial neighboring block H of the co-located block, and whether the lower right intermediate block C3 of the co-located block is available, to obtain motion information of the L1 determined motion vector image blocks; or

Detecting, in sequence, the lower right spatial neighboring block H of the co-located block, and whether the upper left intermediate block C0 of the co-located block is available to obtain motion information of the L2 determined motion vector image blocks; or

Detecting, in sequence, the lower right spatial neighboring block H of the co-located block, the lower right intermediate block C3 of the co-located block, the upper left block TL of the co-located block, and the lower right block BR of the co-located block. Whether the upper left intermediate block C0 of the same position block is available to obtain motion information of the L3 determined motion vector image blocks;

The L group motion information in the motion information of the detected L1 or L2 or L3 determined motion vector image blocks is added as candidate motion information to the candidate list, L1 is equal to or greater than L, or L2 is equal to or greater than L, Or L3 is equal to or greater than L, and L1, L2, and L3 are all integers greater than or equal to zero.
A method according to any one of claims 1 to 8, wherein

The target number is a preset maximum number of candidate motion information in the candidate list of the current image block;

Or,

The target number is the number of candidate motion information determined using an index identifier parsed from the code stream.
An apparatus for acquiring candidate motion information of an image block, wherein the candidate motion information is used to construct a candidate list for inter prediction, and the apparatus includes:

The airspace candidate motion information acquiring module is configured to detect one or more spatial reference blocks of the current image block according to the first preset sequence, and obtain M group original candidate motion information in the candidate list of the image block to be processed, M is an integer greater than or equal to 0;

a time domain candidate motion information acquiring module, configured to detect one or more time domain reference blocks of the current image block according to a second preset order, to obtain L group originals in the candidate list of the image block to be processed Candidate motion information, L is an integer greater than or equal to 0;

An additional candidate motion information acquiring module, configured to: when the number of candidate motion information in the candidate list of the image block to be processed is smaller than a target number, the original candidate motion information of at least one set of bidirectional prediction types included in the candidate list Performing a decomposition process to obtain candidate motion information of a unidirectional prediction type newly constructed by the Q group in the candidate list of the image block to be processed, and Q is an integer greater than or equal to 0.
The apparatus according to claim 10, wherein said set of bidirectional prediction type original candidate motion information comprises: motion information for a forward prediction direction and motion information for a backward prediction direction, wherein said The motion information for the forward prediction direction includes a first reference image list and a first reference image index corresponding to the first reference image list and a motion vector of the first reference image corresponding to the first reference image index; The motion information for the backward prediction direction includes a second reference image list and a second reference image index corresponding to the second reference image list and a motion vector of the second reference image corresponding to the second reference image index;

After the additional candidate motion information acquisition module is decomposed, the candidate motion information of the unidirectional prediction type newly constructed by the Q group includes: a group of motion information of a unidirectional prediction type being a forward prediction encoding/decoding mode and/or The unidirectional prediction type is a set of motion information of a backward prediction encoding/decoding mode, wherein the set of motion information of the forward prediction encoding/decoding mode includes a first reference image list and a first corresponding to the first reference image list a reference image index and a motion vector of the first reference image corresponding to the first reference image index; the set of motion information of the backward predictive encoding/decoding mode includes a second reference image list and corresponding to the second reference image list And a second reference image index and a motion vector of the second reference image corresponding to the second reference image index.
The apparatus according to claim 10 or 11, wherein when the number of candidate motion information in the candidate list of the image block to be processed is smaller than the target number, the additional candidate motion information acquiring module is further configured to: The original candidate motion information of the two sets of unidirectional prediction types included in the candidate list is combined to obtain candidate motion information of the P-group newly constructed bidirectional prediction type in the candidate list of the to-be-processed image block, where P is greater than Or an integer equal to 0.
The apparatus according to any one of claims 10 to 12, wherein the one or more spatial reference blocks comprise: one or more airspaces adjacent to the current image block in an image of a current image block a reference block, and/or one or more spatial reference blocks in the image in which the current image block is located that are not contiguous with the image block to be processed.
The apparatus according to claim 13, wherein one or more spatial reference blocks adjacent to the current image block in the image of the current image block include:

a fourth airspace neighboring block A0 located at a lower left side of the current image block, a first airspace neighboring block A1 located at a left side of the current image block, and a third airspace neighboring block B0 located at an upper right side of the current image block. The second spatial domain adjacent block B1 on the upper side of the current image block, or the fifth airspace adjacent block B2 located on the upper left side of the current image block.
The apparatus according to claim 14, wherein the airspace candidate motion information acquisition module is configured to:

Detecting whether the first airspace neighboring block A1, the second airspace neighboring block B1, the third airspace neighboring block B0, the fourth airspace neighboring block A0, and the fifth airspace neighboring block B2 are available to obtain the first airspace neighboring The motion information of the M1 determined motion vector image blocks in the block A1, the second spatial neighboring block B1, the third spatial neighboring block B0, the fourth spatial neighboring block A0, and the fifth spatial neighboring block B2, where M1 is greater than or equal to 0. Integer,

Adding M sets of motion information in the motion information of the detected M1 determined motion vector image blocks as candidate motion information to the candidate list, where M1 is equal to or greater than M;

Wherein: the detection condition of the fifth airspace neighboring block B2 includes: when any one of the first airspace neighboring block A1, the second airspace neighboring block B1, the third airspace neighboring block B0, and the fourth airspace neighboring block A0 is unavailable, The fifth airspace neighboring block B2 is detected.
The apparatus according to any one of claims 10 to 12, wherein the one or more time domain reference blocks comprise: a lower right spatial neighboring block H of a co-located block of the current image block, the same An upper left intermediate block C0 of the location block, a lower right intermediate block C3 of the same location block, an upper left block TL of the same location block, or a lower right block BR of the same location block, wherein the same location block is An image block of the reference image having the same size, shape, and coordinates as the current image block.
The apparatus according to claim 16, wherein the time domain candidate motion information acquiring module is configured to:

Detecting, in sequence, the right lower spatial neighboring block H of the co-located block, and whether the lower right intermediate block C3 of the co-located block is available, to obtain motion information of the L1 determined motion vector image blocks; or

Detecting, in sequence, the lower right spatial neighboring block H of the co-located block, and whether the upper left intermediate block C0 of the co-located block is available to obtain motion information of the L2 determined motion vector image blocks; or

Detecting, in sequence, the lower right spatial neighboring block H of the co-located block, the lower right intermediate block C3 of the co-located block, the upper left block TL of the co-located block, and the lower right block BR of the co-located block. Whether the upper left intermediate block C0 of the same position block is available to obtain motion information of the L3 determined motion vector image blocks;

The L group motion information in the motion information of the detected L1 or L2 or L3 determined motion vector image blocks is added as candidate motion information to the candidate list, L1 is equal to or greater than L, or L2 is equal to or greater than L, Or L3 is equal to or greater than L, and L1, L2, and L3 are all integers greater than or equal to zero.
Apparatus according to any one of claims 10 to 17, wherein

The device is configured to encode or decode a video image, where the target number is a preset maximum number of candidate motion information in a candidate list of the current image block;

or,

The apparatus is for decoding a video image, the target number being a number of candidate motion information determined using an index identification parsed from the code stream.
A video encoder, wherein the video encoder is used to encode an image block, including:

An interframe predictor, comprising: the apparatus for acquiring candidate motion information of an image block according to any one of claims 10 to 18, wherein the inter predictor is configured to determine a current to be encoded based on the selected candidate motion information in the candidate list. a prediction block of an image block;

An entropy encoder, configured to encode an index identifier into the code stream, where the index identifier is used to indicate the candidate motion information for the selection of the current image block to be encoded;

a reconstructor for reconstructing the image block based on the prediction block.
A video decoder, wherein the video decoder is configured to decode an image block from a code stream, including:

An entropy decoder, configured to decode an index identifier from the code stream, where the index identifier is used to indicate the selected candidate motion information for the current image block to be decoded;

An interframe predictor, comprising: the apparatus for acquiring candidate motion information of an image block according to any one of claims 10 to 18, wherein the inter predictor is configured to determine a current to be decoded based on candidate motion information indicated by the index identifier a prediction block of an image block;

a reconstructor for reconstructing the image block based on the prediction block.