WO2019084776A1 - Procédé et dispositif d'obtention d'informations de mouvement candidates d'un bloc d'image, et codec - Google Patents

Procédé et dispositif d'obtention d'informations de mouvement candidates d'un bloc d'image, et codec Download PDF

Info

Publication number
WO2019084776A1
WO2019084776A1 PCT/CN2017/108611 CN2017108611W WO2019084776A1 WO 2019084776 A1 WO2019084776 A1 WO 2019084776A1 CN 2017108611 W CN2017108611 W CN 2017108611W WO 2019084776 A1 WO2019084776 A1 WO 2019084776A1
Authority
WO
WIPO (PCT)
Prior art keywords
block
motion information
image
candidate
list
Prior art date
Application number
PCT/CN2017/108611
Other languages
English (en)
Chinese (zh)
Inventor
陈旭
安基程
郑建铧
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2017/108611 priority Critical patent/WO2019084776A1/fr
Publication of WO2019084776A1 publication Critical patent/WO2019084776A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding

Definitions

  • the present application relates to the field of video image coding and decoding technologies, and in particular, to a method, an apparatus, an encoder, and a decoder for acquiring candidate motion information of an image block.
  • ITU-TH.265 high Efficiently transmitting and receiving digital video information between devices can be achieved between the high efficiency video coding (HEVC) standard and the video compression techniques described in the extended section of the standard.
  • HEVC high efficiency video coding
  • an image of a video sequence is divided into image blocks for encoding or decoding.
  • inter prediction mode may include, but is not limited to, a merge mode (Merge Mode) and a non-merge mode (for example, an advanced motion vector prediction mode (AMVP mode), etc., and both are inter-predictions by using a method of multi-motion information competition. of.
  • merge Mode merge mode
  • AMVP mode advanced motion vector prediction mode
  • a candidate list including multiple sets of motion information (also referred to as candidate motion information) is introduced.
  • the encoder may select a suitable candidate motion information from the candidate list to predict the current to be encoded.
  • the motion information e.g., motion vector
  • the motion information of the image block thereby obtaining the best reference image block (i.e., prediction block) of the current image block to be encoded.
  • the maximum number of candidates of candidate motion information in the candidate list is defined.
  • a default value eg, a zero vector
  • an index identification is assigned to each set of candidate motion information. It can be seen that this approach leads to a lower reference meaning of some candidate motion information in the candidate list, which in turn leads to a lower accuracy of motion vector prediction, thereby affecting the codec performance.
  • the embodiment of the present application provides a method and an apparatus for acquiring candidate motion information of an image block, and a corresponding encoder and decoder, which improve the accuracy of motion vector prediction, thereby improving codec performance.
  • an embodiment of the present application provides a method for acquiring candidate motion information of an image block, where the candidate motion information is used to construct a candidate list for inter prediction, where the method includes: the candidate motion information acquiring device follows a first preset sequence, detecting one or more spatial reference blocks of the current image block, obtaining M sets of original candidate motion information in the candidate list of the image block to be processed, where M is an integer greater than or equal to 0; The motion information acquiring apparatus detects one or more time domain reference blocks of the current image block according to a second preset sequence, and obtains L sets of original candidate motion information in the candidate list of the image block to be processed, where L is An integer greater than or equal to 0; when the number of candidate motion information in the candidate list of the image block to be processed is less than the target number, the candidate motion information acquiring means further performs at least one set of bidirectional prediction types included in the candidate list Raw candidate motion information (also known as bidirectional pre- Performing decomposition processing on the original candidate motion information of the encoding/
  • the decomposition here can be understood as the inverse process of the combination, that is, splitting the motion information using the bidirectional predictive encoding/decoding mode into a motion information using a backward predictive encoding/decoding mode and a motion using a forward predictive encoding/decoding mode. information.
  • the spatial reference block herein refers to a reference block related to the current image block spatial domain, and may include one or more spatial reference blocks adjacent to the current image block in the image of the current image block, and Or, one or more spatial reference blocks in the image in which the current image block is located that are not adjacent to the current image block.
  • the time domain reference block herein refers to a reference block related to the current image block time domain, and may include one or more airspace references in the reference image adjacent to the co-located block (co-located block). Block, and/or one or more sub-blocks of the co-located block, wherein the co-located block is an image block of the reference image having the same size, shape, and coordinates as the current image block.
  • the reference image herein refers to a reconstructed image. Specifically, the reference image herein refers to a reference image in one or more reference image lists, for example, may be a reference corresponding to a specified reference image index in the specified reference image list.
  • the image may also be the reference image in the first position in the default reference image list, which is not limited in this application. It should be noted that no matter which reference block is used, it refers to a motion vector image block (also referred to as an encoded image block or a decoded image block).
  • each reference block may include a motion vector MV and reference image indication information.
  • the motion information may also include only one or all of the two.
  • the motion information may only include the motion vector MV.
  • the reference picture indication information is used to indicate which one or which reconstructed images are used as the reference image in the current block (the current block refers to the currently available reference block in the current segment), and the motion vector indicates that the reference block position is relative to the current in the used reference image.
  • the positional offset of the block position generally includes a horizontal component offset and a vertical component offset.
  • the reference image indication information may include a reference image list and a reference image index corresponding to the reference image list.
  • the reference image index is used to identify the reference image pointed to by the motion vector in the specified reference image list (RefPicList0 or RefPicList1).
  • a set of motion information for the reference block may include motion information for the forward and backward prediction directions.
  • the forward and backward prediction directions are two prediction directions of the bidirectional prediction mode, and it can be understood that "forward" and “backward” respectively correspond to the reference image list 0 (RefPicList0) and the reference image of the current image.
  • the "forward" prediction direction (RefPicList0) means that the reference image is temporally before the current image.
  • the “backward” prediction direction (RefPicList1) means that the reference image is temporally after the current image.
  • the candidate motion information acquiring device may be a video encoder or a video decoder, for example, may be a motion estimator in a video encoder, or a motion compensator in a video decoder.
  • the candidate motion information in the candidate list of the to-be-processed image block may include the foregoing M sets of original candidate motion information and L sets of original candidate motion information, and may of course include candidate motion information acquired in other manners.
  • the application is not limited to this.
  • the set of bi-predictive types of original candidate motion information includes: Motion information of a forward prediction direction and motion information for a backward prediction direction, wherein the motion information for the forward prediction direction includes a first reference image list and a first reference image index corresponding to the first reference image list And a motion vector of the first reference image corresponding to the first reference image index; the motion information for the backward prediction direction includes a second reference image list and a second reference image index corresponding to the second reference image list And a motion vector directed to the second reference image corresponding to the second reference image index;
  • the Q-group newly constructed unidirectional prediction type candidate motion information includes: a unidirectional prediction type (also referred to as a unidirectional prediction encoding/decoding mode) is a forward prediction direction (also referred to as a forward prediction coding).
  • a unidirectional prediction type also referred to as a unidirectional prediction encoding/decoding mode
  • a forward prediction direction also referred to as a forward prediction coding
  • the information includes a first reference image list and a first reference image index corresponding to the first reference image list and a motion vector of the first reference image corresponding to the first reference image index; a set of motions of the backward prediction direction
  • the information includes a second reference image list and a second reference image index corresponding to the second reference image list and a motion vector of the second reference image corresponding to the second reference image index.
  • the method may further include : when the number of candidate motion information in the candidate list of the image block to be processed is smaller than the target number, the original candidate motion information of the two sets of unidirectional prediction types included in the candidate list (also referred to as one-way Performing a combination process on the original candidate motion information of the prediction encoding/decoding mode to obtain P-group newly constructed bidirectional prediction type candidate motion information in the candidate list of the to-be-processed image block (also referred to as bidirectional prediction encoding/decoding mode) Candidate motion information), P is an integer greater than or equal to zero.
  • the combination here refers to combining a set of unidirectional prediction types with original prediction motion information of the forward prediction direction and another set of unidirectional prediction types with original candidate motion information of the backward prediction direction to obtain a set of newly constructed bidirectional predictions.
  • Types of candidate motion information in other words, combining a set of original candidate motion information using a forward predictive encoding/decoding mode with another set of original candidate motion information using a backward predictive encoding/decoding mode to obtain a set of newly constructed Candidate motion information in a bidirectional predictive encoding/decoding mode is employed.
  • the one or more spatial reference blocks include: one or more spatial reference blocks in the image in which the current image block is located adjacent to the current image block And/or one or more spatial reference blocks in the image in which the current image block is located that are not adjacent to the image block to be processed.
  • the one or more spatial reference blocks in the image in which the current image block is located adjacent to the current image block include:
  • a fourth airspace neighboring block A0 located at a lower left side of the current image block, a first airspace neighboring block A1 located at a left side of the current image block, and a third airspace neighboring block B0 located at an upper right side of the current image block.
  • the one or more spatial reference blocks of the current image block are detected according to the first preset sequence, to obtain a candidate list of the to-be-processed image block.
  • the M group of original candidate motion information in the medium may include:
  • the detection condition of the fifth airspace neighboring block B2 includes: when any one of the first airspace neighboring block A1, the second airspace neighboring block B1, the third airspace neighboring block B0, and the fourth airspace neighboring block A0 is unavailable, The fifth airspace neighboring block B2 is detected.
  • the one or more time domain reference blocks include: a lower right spatial domain neighboring block H of a co-located block (co-located block) of the current image block
  • the co-located block is an image block of the reference image having the same size, shape, and coordinates as the current image block.
  • the detecting, by the second preset sequence, one or more time domain reference blocks of the current image block, to obtain the to-be-processed image block includes:
  • L1 is equal to or greater than L
  • L2 is equal to or greater than L
  • L3 is equal to or greater than L
  • L1, L2, and L3 are all integers greater than or equal to zero.
  • the target quantity is a preset maximum number of candidate motion information in a candidate list of the current image block; or the target quantity is a utilization code stream The number of candidate motion information determined by the index identification obtained in the parsing.
  • a second aspect of the present application provides an apparatus for acquiring candidate motion information of an image block, where the candidate motion information is used to construct a candidate list for inter prediction, and the apparatus includes: an airspace candidate motion information acquiring module, Detecting one or more spatial reference blocks of the current image block according to the first preset sequence, and obtaining M sets of original candidate motion information in the candidate list of the image block to be processed, where M is an integer greater than or equal to 0.
  • a time domain candidate motion information acquiring module configured to detect one or more time domain reference blocks of the current image block according to a second preset sequence, to obtain an L group in the candidate list of the image block to be processed
  • the original candidate motion information, L is an integer greater than or equal to 0
  • the additional candidate motion information acquiring module is configured to: when the number of candidate motion information in the candidate list of the image block to be processed is smaller than the target number, Decomposing original candidate motion information of at least one set of bidirectional prediction types (motion information of bidirectional predictive encoding/decoding mode) included in the method
  • a list of candidate types of the unidirectional prediction image block to be processed in the new set Q of candidate motion information structure (unidirectional predictive coding / decoding mode to the motion information), Q is an integer greater than or equal to 0.
  • the set of bidirectional prediction types of original candidate motion information comprises: motion for a forward prediction direction Information and motion information for a backward prediction direction, wherein the motion information for the forward prediction direction includes a first reference image list and a first reference image index corresponding to the first reference image list and points to the first a motion vector of the first reference image corresponding to the reference image index; the motion information for the backward prediction direction includes a second reference image list and a second reference image index corresponding to the second reference image list and pointing to the second a motion vector of the second reference image corresponding to the reference image index;
  • the Q-group newly constructed unidirectional prediction type candidate motion information includes: a unidirectional prediction type (unidirectional prediction codec mode) is a forward prediction encoding/decoding mode.
  • a set of motion information and/or a unidirectional prediction type is a set of motion information of a backward predictive encoding/decoding mode, wherein the set of motion information of the forward predictive encoding/decoding mode includes a first reference image list and a first reference image index corresponding to the first reference image list and a motion vector of the first reference image corresponding to the first reference image index; a set of the backward prediction encoding/decoding mode
  • the motion information includes a second reference image list and a second reference image index corresponding to the second reference image list and a motion vector of the second reference image corresponding to the second reference image index.
  • the candidate motion information of a newly constructed unidirectional prediction type is overlapped with the candidate motion information existing in the candidate list, the candidate motion information of the newly constructed unidirectional prediction type of the group is not put into the candidate list.
  • the candidate motion information of the newly constructed unidirectional prediction type may be placed in the candidate list before the deduplication operation is performed.
  • the additional candidate motion information acquiring module is further configured to: Combining original candidate motion information of two sets of unidirectional prediction types (unidirectional prediction encoding/decoding modes) included in the candidate list to obtain a bidirectional new configuration bidirectional in the candidate list of the to-be-processed image block Prediction type candidate motion information (candidate motion information of bidirectional prediction encoding/decoding mode), P is an integer greater than or equal to zero.
  • the one or more spatial reference blocks include: one or more spatial reference blocks in the image in which the current image block is located adjacent to the current image block And/or one or more spatial reference blocks in the image in which the current image block is located that are not adjacent to the image block to be processed.
  • the one or more spatial reference blocks in the image in which the current image block is located adjacent to the current image block include:
  • a fourth airspace neighboring block A0 located at a lower left side of the current image block, a first airspace neighboring block A1 located at a left side of the current image block, and a third airspace neighboring block B0 located at an upper right side of the current image block.
  • the spatial domain candidate motion information acquiring module is configured to sequentially detect the first airspace neighboring block A1, the second airspace neighboring block B1, and the third airspace neighboring block. Whether the B0, the fourth airspace neighboring block A0, and the fifth airspace neighboring block B2 are available, to obtain the first airspace neighboring block A1, the second airspace neighboring block B1, the third airspace neighboring block B0, and the fourth airspace neighboring block A0,
  • the fifth spatial domain is adjacent to the motion information of the M1 determined motion vector image blocks in the block B2, and M1 is an integer greater than or equal to 0; the M group motion information in the motion information of the detected M1 determined motion vector image blocks is used as The candidate motion information is added to the candidate list, and M1 is equal to or greater than M; wherein: the detection condition of the fifth airspace neighboring block B2 includes: when the first airspace neighboring block A1, the second airspace neighboring block B1, and the third airspace When any one
  • the one or more time domain reference blocks include: a lower right spatial domain adjacent block H of a co-located block (co-located block) of the current image block
  • the co-located block is an image block of the reference image having the same size, shape, and coordinates as the current image block.
  • the time domain candidate motion information acquiring module is configured to:
  • L1 is equal to or greater than L
  • L2 is equal to or greater than L
  • L3 is equal to or greater than L
  • L1, L2, and L3 are all integers greater than or equal to zero.
  • the apparatus is configured to encode or decode a video image, the target number being a preset maximum number of candidate motion information in a candidate list of the current image block;
  • the apparatus is for decoding a video image, the target number being the number of candidate motion information determined using an index identification parsed from the code stream.
  • a third aspect of the present application provides an apparatus for acquiring candidate motion information of an image block, the candidate motion information being used to construct a candidate list for inter prediction, including: a processor and a memory coupled to the processor
  • the processor is configured to: in the first preset order, detect one or more spatial reference blocks of the current image block, and obtain M sets of original candidate motion information in the candidate list of the to-be-processed image block, where M is An integer greater than or equal to 0; detecting one or more time domain reference blocks of the current image block according to a second preset sequence, to obtain L sets of original candidate motion information in the candidate list of the image block to be processed L is an integer greater than or equal to 0; when the number of candidate motion information in the candidate list of the image block to be processed is less than the target number, the original candidate motion of at least one set of bidirectional prediction types included in the candidate list
  • the information original candidate motion information of the bidirectional predictive encoding/decoding mode
  • decomposition processing to obtain a newly constructed Q group in the candidate list of
  • the set of bi-predictive types of original candidate motion information includes: Motion information of a forward prediction direction and motion information for a backward prediction direction, wherein the motion information for the forward prediction direction includes a first reference image list and a first reference image index corresponding to the first reference image list And a motion vector of the first reference image corresponding to the first reference image index; the motion information for the backward prediction direction includes a second reference image list and a second reference image index corresponding to the second reference image list And a motion vector directed to the second reference image corresponding to the second reference image index;
  • the Q-group newly constructed unidirectional prediction type candidate motion information includes: a unidirectional prediction type (also referred to as a unidirectional prediction encoding/decoding mode) is a forward prediction direction (also referred to as a forward prediction coding).
  • a unidirectional prediction type also referred to as a unidirectional prediction encoding/decoding mode
  • a forward prediction direction also referred to as a forward prediction coding
  • the information includes a first reference image list and a first reference image index corresponding to the first reference image list and a motion vector of the first reference image corresponding to the first reference image index; a set of motions of the backward prediction direction
  • the information includes a second reference image list and a second reference image index corresponding to the second reference image list and a motion vector of the second reference image corresponding to the second reference image index.
  • the processor Further for: when the number of candidate motion information in the candidate list of the image block to be processed is smaller than the target number, the original candidate motion information of the two sets of unidirectional prediction types included in the candidate list (also referred to as And performing, by combining processing, the original candidate motion information of the unidirectional prediction encoding/decoding mode, to obtain candidate motion information of the P group newly constructed bidirectional prediction type in the candidate list of the to-be-processed image block (also referred to as bidirectional prediction coding/ Candidate motion information of the decoding mode), P is an integer greater than or equal to zero.
  • the one or more spatial reference blocks include: one or more spatial reference blocks in the image in which the current image block is located adjacent to the current image block And/or one or more spatial reference blocks in the image in which the current image block is located that are not adjacent to the image block to be processed.
  • the one or more spatial reference blocks in the image in which the current image block is located adjacent to the current image block include:
  • a fourth airspace neighboring block A0 located at a lower left side of the current image block, a first airspace neighboring block A1 located at a left side of the current image block, and a third airspace neighboring block B0 located at an upper right side of the current image block.
  • the one or more spatial reference blocks of the current image block are detected in the first preset order, to obtain a candidate for the image block to be processed.
  • the processor is configured to: sequentially detect the first airspace neighboring block A1, the second airspace neighboring block B1, the third airspace neighboring block B0, and the fourth airspace neighboring Whether the block A0 and the fifth spatial neighboring block B2 are available to obtain the first airspace neighboring block A1, the second airspace neighboring block B1, the third airspace neighboring block B0, the fourth airspace neighboring block A0, and the fifth airspace neighboring block B2.
  • the motion information of the M1 determined motion vector image blocks, M1 is an integer greater than or equal to 0; and the M sets of motion information in the motion information of the detected M1 determined motion vector image blocks are added as candidate motion information to the In the candidate list, M1 is equal to or greater than M; wherein: the detection condition of the fifth spatial neighboring block B2 includes: when the first airspace neighboring block A1, the second airspace neighboring block B1, the third airspace neighboring block B0, and the fourth airspace When any one of the neighboring blocks A0 is not available, the fifth airspace neighboring block B2 is detected.
  • the processor only sets one of the two or more sets of motion information identical to each other. Group motion information is added to the candidate list.
  • the one or more time domain reference blocks include: a lower right spatial domain neighboring block H of a co-located block (co-located block) of the current image block
  • the co-located block is an image block of the reference image having the same size, shape, and coordinates as the current image block.
  • the one or more time domain reference blocks of the current image block are detected in the second preset order to obtain the to-be-processed image.
  • An aspect of the L sets of original candidate motion information in the candidate list of the block the processor configured to: sequentially detect the right lower airspace neighboring block H of the co-located block, and whether the lower right intermediate block C3 of the co-located block is available Obtaining motion information of the L1 determined motion vector image blocks; or sequentially detecting whether the lower right spatial neighboring block H of the co-located block and the upper left intermediate block C0 of the co-located block are available to obtain L2 Determining motion information of the motion vector image block; or sequentially detecting the lower right spatial neighboring block H of the co-located block, the lower right intermediate block C3 of the co-located block, and the upper left block TL of the co-located block Whether the lower right block BR of the same location block, the upper left intermediate block C0 of the same location
  • the target quantity is a preset maximum number of candidate motion information in the candidate list of the current image block; or the target quantity is a utilization code stream The number of candidate motion information determined by the index identification obtained in the parsing.
  • a fourth aspect of the present application provides an apparatus for acquiring candidate motion information of an image block, configured to acquire candidate motion information to construct a candidate list for inter prediction, including: a processor and a memory coupled to the processor ;
  • the processor 1201 is configured to: according to the first preset sequence, detect one or more spatial reference blocks of the current image block, to obtain M sets of original candidate motion information for constructing a candidate list of the current image block, M is an integer greater than or equal to 0; detecting one or more time domain reference blocks of the current image block according to a second preset order, to obtain an L group for constructing a candidate list of the image block to be processed Raw candidate motion information, L is an integer greater than or equal to 0; when the number of candidate motion information for constructing the candidate list of the image block to be processed is smaller than the target number, candidates for constructing the image block to be processed
  • the original candidate motion information of at least one set of bidirectional prediction types included in the candidate motion information of the list is subjected to decomposition processing to obtain candidate motion information of the unidirectional prediction type of the Q group newly constructed for constructing the candidate list of the to-be-processed image block.
  • Q is an integer greater than or equal to 0.
  • the original candidate motion information of the set of bidirectional prediction types includes: motion information for a forward prediction direction and motion information for a backward prediction direction,
  • the motion information for the forward prediction direction includes a first reference image list and a first reference image index corresponding to the first reference image list and a motion vector of the first reference image corresponding to the first reference image index
  • the motion information for the backward prediction direction includes a second reference image list and a second reference image index corresponding to the second reference image list and a motion vector of the second reference image corresponding to the second reference image index ;
  • the candidate motion information of the unidirectional prediction type of the Q group newly constructed includes: a group of motion information whose unidirectional prediction type is a forward prediction direction and/or a group whose unidirectional prediction type is a backward prediction direction.
  • Motion information wherein the set of motion information of the forward prediction direction includes a first reference image list and a first reference image index corresponding to the first reference image list and a first reference image corresponding to the first reference image index a motion vector; the set of motion information of the backward prediction direction includes a second reference image list and a second reference image index corresponding to the second reference image list and a second reference image corresponding to the second reference image index Sport vector.
  • the processor in order to further mine more reference candidate motion information as much as possible to further improve the accuracy of motion vector prediction, the processor Further used for:
  • the original candidate motion information of the two sets of unidirectional prediction types included in the candidate motion information for constructing the candidate list is constructed Performing a combination process to obtain candidate motion information of a P group newly constructed bidirectional prediction type for constructing a candidate list of the image block to be processed, P being an integer greater than or equal to 0.
  • the one or more spatial reference blocks include: one or more spatial reference blocks in the image in which the current image block is located adjacent to the current image block And/or one or more spatial reference blocks in the image in which the current image block is located that are not adjacent to the image block to be processed.
  • the one or more spatial reference blocks in the image in which the current image block is located adjacent to the current image block include:
  • a fourth airspace neighboring block A0 located at a lower left side of the current image block, a first airspace neighboring block A1 located at a left side of the current image block, and a third airspace neighboring block B0 located at an upper right side of the current image block.
  • the one or more time domain reference blocks include: a lower right spatial domain neighboring block H of a co-located block (co-located block) of the current image block
  • the co-located block is an image block of the reference image having the same size, shape, and coordinates as the current image block.
  • the target quantity is a preset maximum number of candidate motion information in the candidate list of the current image block; or the target quantity is a utilization code stream The number of candidate motion information determined by the index identification obtained in the parsing.
  • a fifth aspect of the present application provides a video encoder, the video encoder for encoding an image block, comprising: an inter predictor, wherein the inter predictor comprises the second aspect or the third aspect or the fourth aspect
  • the apparatus for acquiring candidate motion information of an image block wherein the inter predictor is configured to determine a prediction block of a current image block to be encoded based on the candidate motion information selected in the candidate list;
  • the video encoder further includes: an entropy encoder For indexing an index identifier for indicating the selected candidate motion information for the current image block to be encoded, and a reconstructor for reconstructing the image block based on the prediction block .
  • the inter predictor herein may include a motion estimation module and a motion compensation module, where the motion estimation module is configured to acquire candidate motion information of a current image block to be encoded to construct a candidate list; and the motion compensation module is configured to: A prediction block of the current image block to be encoded is determined based on the candidate motion information selected in the candidate list.
  • the inter predictor is further configured to select candidate motion information for the current image block to be encoded from the plurality of candidate motion information included in the candidate list. And wherein the selected candidate motion information encodes the current code to be encoded image block with the lowest rate penalty cost.
  • a sixth aspect of the present application provides a video decoder, where the video decoder is configured to decode an image block from a code stream, including: an entropy decoder, configured to decode an index identifier from a code stream, where the index identifier is used And an apparatus for acquiring candidate motion information for an image block according to the second aspect or the third aspect or the fourth aspect, wherein the selected candidate motion information is used for the image block to be decoded;
  • the inter predictor is configured to determine a prediction block of an image block to be decoded currently based on candidate motion information indicated by the index identifier; and a reconstructor to reconstruct the image block based on the prediction block.
  • a seventh aspect of the present application provides a computer readable storage medium having stored therein instructions that, when run on a computer, cause the computer to perform the method of the first aspect described above.
  • An eighth aspect of the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the first aspect described above.
  • a ninth aspect of the present application provides an electronic device, comprising the video encoder according to the above fifth aspect, or the video decoder according to the sixth aspect, or the image described in the second, third or fourth aspect A device for acquiring candidate motion information of a block.
  • FIG. 1 is a schematic block diagram of a video encoding and decoding system in an embodiment of the present application
  • FIG. 2 is a schematic block diagram of a video encoder in an embodiment of the present application.
  • FIG. 3 is a schematic block diagram of a video decoder in an embodiment of the present application.
  • FIG. 4 is an exemplary flowchart of an encoding method performed by a video encoder in a merge mode in an embodiment of the present application
  • FIG. 5 is an exemplary flowchart of motion compensation performed by a video decoder in an embodiment of the present application
  • 6A and 6B are schematic diagrams showing an encoding unit and an adjacent position image block and a non-adjacent position image block associated therewith in the embodiment of the present application;
  • FIG. 7 is an exemplary flowchart of a method for acquiring candidate motion information of an image block according to an embodiment of the present application
  • FIG. 8 is another exemplary flowchart of a method for acquiring candidate motion information of an image block according to an embodiment of the present application.
  • FIG. 9 is an exemplary schematic diagram of adding a decomposed candidate motion vector to a merge mode candidate list in the embodiment of the present application.
  • FIG. 10 is an exemplary schematic diagram of adding a combined candidate motion vector to a merge mode candidate list in an embodiment of the present application.
  • FIG. 11 is a schematic block diagram of an apparatus for acquiring candidate motion information of an image block in an embodiment of the present application.
  • FIG. 12 is a schematic block diagram of an encoding device or a decoding device according to an embodiment of the present application.
  • FIG. 1 is a schematic block diagram of a video encoding and decoding system 10 in an embodiment of the present application.
  • system 10 includes source device 12 that produces encoded video data that will be decoded by destination device 14 at a later time.
  • Source device 12 and destination device 14 may comprise any of a wide range of devices, including desktop computers, notebook computers, tablet computers, set top boxes, telephone handsets such as so-called “smart” phones, so-called “smart” "Touchpads, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices or the like.
  • source device 12 and destination device 14 may be equipped for wireless communication.
  • Link 16 may include any type of media or device capable of moving encoded video data from source device 12 to destination device 14.
  • link 16 may include communication media that enables source device 12 to transmit encoded video data directly to destination device 14 in real time.
  • the encoded video data can be modulated and transmitted to destination device 14 in accordance with a communication standard (e.g., a wireless communication protocol).
  • Communication media can include any wireless or wired communication medium, such as a radio frequency spectrum or one or more physical transmission lines.
  • the communication medium can form part of a packet-based network (eg, a global network of local area networks, wide area networks, or the Internet).
  • Communication media can include routers, switches, base stations, or any other equipment that can be used to facilitate communication from source device 12 to destination device 14.
  • the encoded data may be output from output interface 22 to storage device 24.
  • encoded data can be accessed from storage device 24 by an input interface.
  • Storage device 24 may comprise any of a variety of distributed or locally accessed data storage media, such as a hard drive, Blu-ray Disc, DVD, CD-ROM, flash memory, volatile or non-volatile memory Or any other suitable digital storage medium for storing encoded video data.
  • storage device 24 may correspond to a file server or another intermediate storage device that may maintain encoded video produced by source device 12. Destination device 14 may access the stored video data from storage device 24 via streaming or download.
  • the file server can be any type of server capable of storing encoded video data and transmitting this encoded video data to destination device 14.
  • a file server includes a web server, a file transfer protocol server, a network attached storage device, or a local disk unit.
  • Destination device 14 can access the encoded video data via any standard data connection that includes an Internet connection.
  • This data connection may include a wireless channel (eg, a Wi-Fi connection), a wired connection (eg, a cable modem, etc.), or a combination of both, suitable for accessing encoded video data stored on a file server.
  • the transmission of encoded video data from storage device 24 may be streaming, downloading, or a combination of both.
  • system 10 can be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.
  • source device 12 includes video source 18, video encoder 20, and output interface 22.
  • output interface 22 can include a modulator/demodulator (modem) and/or a transmitter.
  • video source 18 may include sources such as video capture devices (eg, cameras), video archives containing previously captured video, video feed interfaces to receive video from video content providers And/or a computer graphics system for generating computer graphics data as source video, or a combination of these sources.
  • the video source 18 is a video camera
  • the source device 12 and the destination device 14 may form a so-called camera phone or video phone.
  • the techniques described in this application are illustratively applicable to video decoding and are applicable to wireless and/or wired applications.
  • Captured, pre-captured, or computer generated video may be encoded by video encoder 20.
  • the encoded video data can be transmitted directly to the destination device 14 via the output interface 22 of the source device 12.
  • the encoded video data may also (or alternatively) be stored on storage device 24 for later access by destination device 14 or other device for decoding and/or playback.
  • the destination device 14 includes an input interface 28, a video decoder 30, and a display device 32.
  • input interface 28 can include a receiver and/or a modem.
  • Input interface 28 of destination device 14 receives encoded video data via link 16.
  • the encoded video data communicated or provided on storage device 24 via link 16 may include various syntax elements generated by video encoder 20 for use by video decoders of video decoder 30 to decode the video data. These grammar elements
  • the prime may be included with encoded video data transmitted over a communication medium, stored on a storage medium, or stored on a file server.
  • Display device 32 may be integrated with destination device 14 or external to destination device 14.
  • destination device 14 can include an integrated display device and is also configured to interface with an external display device.
  • the destination device 14 can be a display device.
  • display device 32 displays decoded video data to a user and may include any of a variety of display devices, such as a liquid crystal display, a plasma display, an organic light emitting diode display, or another type of display device.
  • Video encoder 20 and video decoder 30 may operate in accordance with, for example, the next generation video codec compression standard (H.266) currently under development and may conform to the H.266 Test Model (JEM).
  • video encoder 20 and video decoder 30 may be according to, for example, the ITU-TH.265 standard, also referred to as a high efficiency video decoding standard, or other proprietary or industry standard of the ITU-TH.264 standard or an extension of these standards.
  • the ITU-TH.264 standard is alternatively referred to as MPEG-4 Part 10, also known as advanced video coding (AVC).
  • AVC advanced video coding
  • the techniques of this application are not limited to any particular decoding standard.
  • Other possible implementations of the video compression standard include MPEG-2 and ITU-TH.263.
  • video encoder 20 and video decoder 30 may each be integrated with an audio encoder and decoder and may include a suitable multiplexer-demultiplexer (MUX-DEMUX) unit or other hardware and software to handle the encoding of both audio and video in a common data stream or in a separate data stream.
  • MUX-DEMUX multiplexer-demultiplexer
  • the MUX-DEMUX unit may conform to the ITU H.223 multiplexer protocol or other protocols such as the User Datagram Protocol (UDP).
  • UDP User Datagram Protocol
  • Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder circuits, such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), Field Programmable Gate Array (FPGA), discrete logic, software, hardware, firmware, or any combination thereof.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGA Field Programmable Gate Array
  • the apparatus may store the instructions of the software in a suitable non-transitory computer readable medium and execute the instructions in hardware using one or more processors to perform the techniques of the present application.
  • Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, any of which may be integrated into a combined encoder/decoder (CODEC) in a respective device. part.
  • CDEC combined encoder/decoder
  • the present application may illustratively involve video encoder 20 "signaling" particular information to another device, such as video decoder 30.
  • video encoder 20 may signal information by associating particular syntax elements with various encoded portions of the video data. That is, video encoder 20 may "signal" the data by storing the particular syntax elements to the header information of the various encoded portions of the video data.
  • these syntax elements may be encoded and stored (eg, stored to storage system 34 or file server 36) prior to being received and decoded by video decoder 30.
  • the term “signaling” may illustratively refer to the communication of grammar or other data used to decode compressed video data, whether this communication occurs in real time or near real time or occurs over a time span, such as may be encoded Occurs when a syntax element is stored to the media, and the syntax element can then be retrieved by the decoding device at any time after storage to the media.
  • H.265 JCT-VC developed the H.265 (HEVC) standard.
  • HEVC standardization is based on an evolution model of a video decoding device called the HEVC Test Model (HM).
  • HM HEVC Test Model
  • the latest standard documentation for H.265 is available at http://www.itu.int/rec/T-REC-H.265.
  • the latest version of the standard document is H.265 (12/16), which is the full text of the standard document.
  • the manner of reference is incorporated herein.
  • the HM assumes that the video decoding device has several additional capabilities with respect to existing algorithms of ITU-TH.264/AVC. For example, H.264 provides nine intra-prediction coding modes, while HM provides up to 35 intra-prediction coding modes.
  • JVET is committed to the development of the H.266 standard.
  • the H.266 standardization process is based on an evolution model of a video decoding device called the H.266 test model.
  • the algorithm description of H.266 is available from http://phenix.int-evry.fr/jvet, and the latest algorithm description is included in JVET-F1001-v2, which is incorporated herein by reference in its entirety.
  • the reference software for the JEM test model is available from https://jvet.hhi.fraunhofer.de/svn/svn_HMJEMSoftware/, which is also incorporated herein by reference in its entirety.
  • HM can divide a video frame or image into a sequence of treeblocks or largest coding units (LCUs) containing both luminance and chrominance samples, also referred to as CTUs.
  • Treeblocks have similar purposes to macroblocks of the H.264 standard.
  • a stripe contains several consecutive treeblocks in decoding order.
  • a video frame or image can be segmented into one or more stripes.
  • Each tree block can be split into coding units according to a quadtree. For example, a tree block that is the root node of a quadtree can be split into four child nodes, and each child node can be a parent node again and split into four other child nodes.
  • the final non-splitable child nodes that are leaf nodes of the quadtree include decoding nodes, such as decoded video blocks.
  • the syntax data associated with the decoded code stream may define the maximum number of times the tree block can be split, and may also define the minimum size of the decoded node.
  • the coding unit includes a decoding node and a prediction unit (PU) and a transform unit (TU) associated with the decoding node.
  • the size of the CU corresponds to the size of the decoding node and the shape must be square.
  • the size of the CU may range from 8 x 8 pixels up to a maximum of 64 x 64 pixels or larger.
  • Each CU may contain one or more PUs and one or more TUs.
  • syntax data associated with a CU may describe a situation in which a CU is partitioned into one or more PUs.
  • the split mode may be different between situations where the CU is skipped or encoded by direct mode coding, intra prediction mode coding, or inter prediction mode.
  • the PU can be divided into a shape that is non-square.
  • syntax data associated with a CU may also describe a situation in which a CU is partitioned into one or more TUs according to a quadtree.
  • the shape of the TU can be square or non
  • the HEVC standard allows for transforms based on TUs, which can be different for different CUs.
  • the TU is typically sized based on the size of the PU within a given CU defined for the partitioned LCU, although this may not always be the case.
  • the size of the TU is usually the same as or smaller than the PU.
  • the residual samples corresponding to the CU may be subdivided into smaller units using a quadtree structure called a "residual qualtree" (RQT).
  • RQT residual qualtree
  • the leaf node of the RQT can be referred to as a TU.
  • the pixel difference values associated with the TU may be transformed to produce transform coefficients, which may be quantized.
  • a PU contains data related to the prediction process.
  • the PU when the PU is intra-mode encoded, the PU may include data describing the intra prediction mode of the PU.
  • the PU when the PU is inter-mode encoded, the PU may include data defining a motion vector of the PU.
  • the data defining the motion vector of the PU may describe the horizontal component of the motion vector, the vertical component of the motion vector, the resolution of the motion vector (eg, quarter-pixel precision or eighth-pixel precision), motion vector A reference image pointed to, and/or a reference image list of motion vectors (eg, list 0, list 1, or list C).
  • TUs use transform and quantization processes.
  • a given CU with one or more PUs may also contain one or more TUs.
  • video encoder 20 may calculate a residual value corresponding to the PU.
  • the residual value includes pixel difference values, which can be transformed into transform coefficients, quantized, and scanned using TU to produce serialized transform coefficients for entropy decoding.
  • the present application generally refers to the term "video block” to refer to a decoding node of a CU.
  • the term "video block” may also be used herein to refer to a tree block containing a decoding node as well as a PU and a TU, eg, an LCU or CU.
  • a video sequence usually contains a series of video frames or images.
  • a group of picture illustratively includes a series of one or more video images.
  • the GOP may include syntax data in the header information of the GOP, in the header information of one or more of the images, or elsewhere, the syntax data describing the number of images included in the GOP.
  • Each strip of the image may contain stripe syntax data describing the encoding mode of the corresponding image.
  • Video encoder 20 is typically within an individual video stripe
  • the video block operates to encode the video data.
  • a video block may correspond to a decoding node within a CU.
  • Video blocks may have fixed or varying sizes and may vary in size depending on the specified decoding criteria.
  • HM supports prediction of various PU sizes. Assuming that the size of a specific CU is 2N ⁇ 2N, HM supports intra prediction of PU size of 2N ⁇ 2N or N ⁇ N, and inter-frame prediction of 2N ⁇ 2N, 2N ⁇ N, N ⁇ 2N or N ⁇ N symmetric PU size prediction. The HM also supports asymmetric partitioning of inter-prediction of PU sizes of 2N x nU, 2N x nD, nL x 2N, and nR x 2N. In the asymmetric segmentation, one direction of the CU is not divided, and the other direction is divided into 25% and 75%.
  • 2N x nU refers to a horizontally partitioned 2N x 2 NCU, where 2N x 0.5 NPU is at the top and 2N x 1.5 NPU is at the bottom.
  • N x N and N by N are used interchangeably to refer to the pixel size of a video block in accordance with the vertical dimension and the horizontal dimension, for example, 16 x 16 pixels or 16 by 16 pixels.
  • an N x N block has N pixels in the vertical direction and N pixels in the horizontal direction, where N represents a non-negative integer value.
  • the pixels in the block can be arranged in rows and columns. Further, the block does not necessarily need to have the same number of pixels in the horizontal direction as in the vertical direction.
  • a block may include N x M pixels, where M is not necessarily equal to N.
  • video encoder 20 may calculate residual data for the TU of the CU.
  • a PU may include pixel data in a spatial domain (also referred to as a pixel domain), and a TU may be included in transforming (eg, discrete cosine transform (DCT), integer transform, wavelet transform, or conceptually similar transform) Coefficients in the transform domain after application to the residual video data.
  • the residual data may correspond to a pixel difference between a pixel of the uncoded image and a predicted value corresponding to the PU.
  • Video encoder 20 may form a TU that includes residual data for the CU, and then transform the TU to generate transform coefficients for the CU.
  • video encoder 20 may perform quantization of the transform coefficients.
  • Quantization illustratively refers to the process of quantizing the coefficients to possibly reduce the amount of data used to represent the coefficients to provide further compression.
  • the quantization process can reduce the bit depth associated with some or all of the coefficients. For example, the n-bit value can be rounded down to an m-bit value during quantization, where n is greater than m.
  • the JEM model further improves the coding structure of video images.
  • a block coding structure called "Quad Tree Combined Binary Tree" (QTBT) is introduced.
  • QTBT Quality Tree Combined Binary Tree
  • the QTBT structure rejects the concepts of CU, PU, TU, etc. in HEVC, and supports more flexible CU partitioning shapes.
  • One CU can be square or rectangular.
  • a CTU first performs quadtree partitioning, and the leaf nodes of the quadtree further perform binary tree partitioning.
  • there are two division modes in the binary tree division symmetric horizontal division and symmetric vertical division.
  • the leaf nodes of the binary tree are called CUs, and the CUs of the JEM cannot be further divided during the prediction and transformation process, that is, the CUs, PUs, and TUs of the JEM have the same block size.
  • the maximum size of the CTU is 256 ⁇ 256 luma pixels.
  • video encoder 20 may utilize a predefined scan order to scan the quantized transform coefficients to produce an entropy encoded serialized vector.
  • video encoder 20 may perform an adaptive scan. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 20 may be based on context adaptive variable length decoding (CAVLC), context adaptive binary arithmetic decoding (CABAC), grammar based context adaptive binary. Arithmetic decoding (SBAC), probability interval partitioning entropy (PI PE) decoding, or other entropy decoding methods are used to entropy decode one-dimensional vectors.
  • Video encoder 20 may also entropy encode syntax elements associated with the encoded video data for use by video decoder 30 to decode the video data.
  • video encoder 20 may assign contexts within the context model to the symbols to be transmitted.
  • the context can be related to whether the adjacent value of the symbol is non-zero.
  • video encoder 20 may select a variable length code of the symbol to be transmitted. Codewords in variable length decoding (VLC) may be constructed such that relatively shorter codes correspond to more likely symbols, while longer codes correspond to less likely symbols. In this way, the use of VLC can achieve the goal of saving code rate with respect to using equal length codewords for each symbol to be transmitted.
  • the probability in CABAC can be determined based on the context assigned to the symbol.
  • a video encoder may perform inter prediction to reduce temporal redundancy between images.
  • a CU may have one or more prediction units PU as specified by different video compression codec standards.
  • multiple PUs may belong to the CU, or the PUs and CUs may be the same size.
  • the partition mode of the CU is not divided, or is divided into one PU, and the PU is used for expression.
  • the video encoder can signal the video decoder for motion information for the PU.
  • the motion information of the PU may include: a reference image index, a motion vector, and a prediction direction identifier.
  • the motion vector may indicate a displacement between an image block (also referred to as a video block, a block of pixels, a set of pixels, etc.) of the PU and a reference block of the PU.
  • the reference block of the PU may be part of a reference image of an image block similar to a PU.
  • the reference block may be located in a reference image indicated by the reference image index and the prediction direction indicator.
  • the video encoder may generate a candidate motion information list for each of the PUs according to the merge prediction mode or the advanced motion vector prediction mode process (hereinafter referred to as a candidate) List).
  • Each candidate in the candidate list for the PU may represent a set of motion information.
  • the motion information may include a motion vector MV and reference image indication information.
  • the motion information may also include only one or all of the two. For example, if the codec side agrees on the reference image, the motion information may only include the motion vector.
  • the motion information represented by some of the candidates in the candidate list may be based on motion information of other PUs.
  • the present application may refer to the candidates as "original" candidate motion information.
  • original candidate motion information For example, for a merge mode, also referred to herein as a merge prediction mode, there may be five original spatial candidate locations and one original temporal candidate location.
  • the video encoder may also generate additional or additional candidate motion information by some means, such as inserting a zero motion vector as candidate motion information to generate additional candidate motion information. These additional candidate motion information are not considered raw candidate motion information and may be referred to as late or artificially generated candidate motion information in this application.
  • the techniques of the present application generally relate to techniques for generating a candidate list at a video encoder and techniques for generating the same candidate list at a video decoder.
  • the video encoder and video decoder may generate the same candidate list by implementing the same techniques used to construct the candidate list. For example, both a video encoder and a video decoder can construct a list with the same number of candidates (eg, five candidates).
  • the video encoder and decoder may first consider spatial candidates (eg, neighboring blocks in the same image), then consider temporal candidates (eg, candidates in different images), and finally may consider artificially generated candidates until Add the required number of candidates to the list.
  • a pruning operation may be utilized for certain types of candidate motion information to remove duplicates from the candidate list during candidate list construction, while for other types of candidates, pruning may not be used to reduce decoder complexity .
  • a pruning operation may be performed to exclude candidates with repeated motion information from the list of candidates.
  • the artificially generated candidate may be added without performing a pruning operation on the artificially generated candidate.
  • the video encoder may select candidate motion information from the candidate list and output an index identifier indicating the selected candidate motion information in the code stream.
  • the selected candidate motion information may be motion information having a prediction block that produces the closest match to the PU being decoded.
  • the aforementioned index identification may indicate the location of the candidate motion information selected in the candidate list.
  • the video encoder may also generate a prediction block for the PU based on the reference block indicated by the motion information of the PU.
  • the motion information of the PU may be determined based on the selected candidate motion information. For example, in the merge mode, it is determined that the selected candidate motion information is the motion information of the PU.
  • the motion information of the PU may be determined based on the motion vector difference of the PU and the selected candidate motion information.
  • the video encoder may generate one or more residual image blocks (abbreviated as residual blocks) for the CU based on the predictive image blocks of the PU of the CU (referred to as prediction blocks for short) and the original image blocks for the CU.
  • the video encoder may then encode one or more residual blocks and output a code stream.
  • the code stream may include data for identifying selected candidate motion information in the candidate list of PUs.
  • the video decoder may determine motion information for the PU based on the selected candidate motion information in the candidate list of PUs.
  • the video decoder may identify one or more reference blocks for the PU based on the motion information of the PU. After identifying one or more reference blocks of the PU, the video decoder may generate a prediction block for the PU based on one or more reference blocks of the PU.
  • the video decoder may reconstruct an image block for the CU based on the prediction block for the PU of the CU and one or more residual blocks for the CU.
  • the present application may describe a location or image block as having various spatial relationships with a CU or PU. This description may be interpreted to mean that the location or image block and the image block associated with the CU or PU have various spatial relationships.
  • the present application may refer to a PU that is currently being decoded by a video decoder as a current PU, also referred to as a current image block to be processed.
  • the present application may refer to a CU currently being decoded by a video decoder as a current CU.
  • the present application may refer to the image currently being decoded by the video decoder as the current image. It should be understood that the present application is applicable to the case where the PU and the CU have the same size, or the PU is the CU, and the PU is used uniformly.
  • video encoder 20 may use inter prediction to generate prediction blocks and motion information for PUs of the CU.
  • the motion information of the PU may be the same or similar to the motion information of one or more neighboring PUs (ie, PUs whose image blocks are spatially or temporally near the image block of the PU). Because neighboring PUs often have similar motion information, video encoder 20 may encode motion information for the PU with reference to motion information of neighboring PUs. Encoding the motion information of the PU with reference to the motion information of the neighboring PU may reduce the number of coded bits required in the code stream indicating the motion information of the PU.
  • Video encoder 20 may encode motion information for the PU with reference to motion information of neighboring PUs in various manners. For example, video encoder 20 may indicate that the motion information for the PU is the same as the motion information for nearby PUs. The present application may use a merge mode to indicate that the motion information indicating the PU is the same as the motion information of the neighboring PU or may be derived from the motion information of the neighboring PU. In another possible implementation, video encoder 20 may calculate a Motion Vector Difference (MVD) for the PU. The MVD indicates the difference between the motion vector of the PU and the motion vector of the neighboring PU. Video encoder 20 may include the MVD instead of the motion vector of the PU in the motion information of the PU.
  • MVD Motion Vector Difference
  • the representation of the MVD in the code stream is less than the coded bits required to represent the motion vector of the PU.
  • the present application can use the advanced motion vector prediction mode to refer to the motion information of the PU at the decoding end by using the index value of the MVD and the recognition candidate (ie, candidate motion information).
  • video encoder 20 may generate a candidate list for the PU.
  • the candidate list may include one or more candidates (ie, one or more sets of candidate motion information).
  • Each candidate in the candidate list for the PU represents a set of motion information.
  • the set of motion information may include a motion vector, a reference image list, and a reference image index corresponding to the reference image list.
  • video encoder 20 may select one of a plurality of candidates from the candidate list for the PU. For example, a video encoder can compare each candidate with the PU being decoded and can select A candidate for the required rate-distortion cost. Video encoder 20 may output a candidate index for the PU. The candidate index can identify the location of the selected candidate in the candidate list.
  • video encoder 20 may generate a prediction block for the PU based on the reference block indicated by the motion information of the PU.
  • the motion information of the PU may be determined based on the selected candidate motion information in the candidate list for the PU.
  • video decoder 30 may generate a candidate list for each of the PUs of the CU.
  • the candidate list generated by video decoder 30 for the PU may be the same as the candidate list generated by video encoder 20 for the PU.
  • the syntax elements parsed from the code stream may indicate the location of the candidate motion information selected in the candidate list of PUs.
  • video decoder 30 may generate a prediction block for the PU based on one or more reference blocks indicated by the motion information of the PU.
  • Video decoder 30 may determine motion information for the PU based on candidate motion information selected in the candidate list for the PU.
  • Video decoder 30 may reconstruct an image block for the CU based on the prediction block for the PU and the residual block for the CU.
  • the construction of the candidate list is independent of the position of the candidate selected in the candidate list from the code stream, and may be performed in any order or in parallel.
  • the location of the selected candidate in the candidate list is first parsed from the code stream, and the candidate list is constructed according to the parsed location.
  • no construction is needed.
  • only the candidate list at the parsed location needs to be constructed, that is, the candidate at the location can be determined.
  • the code stream is parsed to find that the selected candidate is a candidate whose index identifier is 3 in the candidate list
  • FIG. 2 is a schematic block diagram of a video encoder 20 in the embodiment of the present application.
  • Video encoder 20 may perform intra-frame decoding and inter-frame decoding of video blocks within a video stripe.
  • Intra decoding relies on spatial prediction to reduce or remove spatial redundancy of video within a given video frame or image.
  • Inter-frame decoding relies on temporal prediction to reduce or remove temporal redundancy of video within adjacent frames or images of a video sequence.
  • the intra mode (I mode) may refer to any of a number of space based compression modes.
  • An inter mode such as unidirectional prediction (P mode) or bidirectional prediction (B mode) may refer to any of several time-based compression modes.
  • video encoder 20 includes a partitioning unit 35, a prediction unit 41, a reference image memory 64, a summer 50, a transform processing unit 52, a quantization unit 54, and an entropy encoding unit 56.
  • the prediction unit 41 includes an inter prediction unit (not shown) and an intra prediction unit 46.
  • the inter prediction unit may include a motion estimation unit 42 and a motion compensation unit 44.
  • video encoder 20 may also include inverse quantization unit 58, inverse transform unit 60, and a summer (also referred to as reconstructor) 62.
  • a deblocking filter (not shown in Figure 2) may also be included to filter the block boundaries to remove blockiness artifacts from the reconstructed video. The deblocking filter will typically filter the output of summer 62 as needed.
  • an additional loop filter in-loop or post-loop can also be used.
  • video encoder 20 receives video data, and segmentation unit 35 segments the data into video blocks.
  • This partitioning may also include partitioning into strips, image blocks, or other larger units, and, for example, video block partitioning based on the quadtree structure of the LCU and CU.
  • Video encoder 20 exemplarily illustrates the components of a video block encoded within a video strip to be encoded. In general, a stripe may be partitioned into multiple video blocks (and possibly into a collection of video blocks called image blocks).
  • Prediction unit 41 may select one of a plurality of possible decoding modes of the current video block based on the encoding quality and the cost calculation result (eg, rate-distortion cost, RDcost), such as one or more of a plurality of intra-coding modes One of the inter-frame decoding modes. Prediction unit 41 may provide the resulting intra-coded or inter-coded block to summer 50 to generate a residual The block data is differenceed and the resulting intra-coded or inter-coded block is provided to summer 62 to reconstruct the coded block for use as a reference picture.
  • rate-distortion cost e.g., rate-distortion cost, RDcost
  • Inter-prediction units within prediction unit 41 perform inter-predictive decoding of current video blocks relative to one or more of the one or more reference pictures to provide Time compression.
  • Motion estimation unit 42 is operative to determine an inter prediction mode for the video stripe based on a predetermined pattern of the video sequence. The predetermined mode specifies the video strips in the sequence as P strips, B strips, or GPB strips.
  • Motion estimation unit 42 and motion compensation unit 44 may be highly integrated, but are separately illustrated for conceptual purposes.
  • the motion estimation performed by motion estimation unit 42 produces a process of estimating the motion vector of the video block.
  • the motion vector may indicate the displacement of the PU of the video block within the current video frame or image relative to the predicted block within the reference image.
  • the prediction block is a block of PUs that are found to closely match the video block to be decoded according to the pixel difference, and the pixel difference may be determined by absolute difference sum (SAD), squared difference sum (SSD) or other difference metric.
  • video encoder 20 may calculate a value of a sub-integer pixel location of a reference image stored in reference image memory 64. For example, video encoder 20 may interpolate values of a quarter pixel position, an eighth pixel position, or other fractional pixel position of a reference image. Accordingly, motion estimation unit 42 may perform a motion search with respect to the full pixel position and the fractional pixel position and output a motion vector having fractional pixel precision.
  • Motion estimation unit 42 calculates the motion vector of the PU of the video block in the inter-coded slice by comparing the location of the PU with the location of the prediction block of the reference picture.
  • the reference images may be selected from a first reference image list (List 0) or a second reference image list (List 1), each of the lists identifying one or more reference images stored in the reference image memory 64.
  • Motion estimation unit 42 transmits the computed motion vector to entropy encoding unit 56 and motion compensation unit 44.
  • Motion compensation performed by motion compensation unit 44 may involve extracting or generating a prediction block based on motion vectors determined by motion estimation. After receiving the motion vector of the PU of the current video block, motion compensation unit 44 may locate the prediction block pointed to by the motion vector in one of the reference picture lists.
  • the video encoder 20 forms a residual video block by subtracting the pixel value of the prediction block from the pixel value of the current video block being decoded, thereby forming a pixel difference value.
  • the pixel difference values form residual data for the block and may include both luminance and chrominance difference components.
  • Summer 50 represents one or more components that perform this subtraction.
  • Motion compensation unit 44 may also generate syntax elements associated with video blocks and video slices for video decoder 30 to use to decode video blocks of video slices.
  • the PU-containing image may be associated with two reference image lists called "List 0" and "List 1".
  • an image containing B strips may be associated with a list combination that is a combination of List 0 and List 1.
  • motion estimation unit 42 may perform uni-directional prediction or bi-directional prediction for the PU, wherein, in some possible implementations, bi-directional prediction is based on List 0 and List 1 reference image lists, respectively.
  • the prediction performed by the image in other possible embodiments, the bidirectional prediction is prediction based on the reconstructed future frame and the reconstructed past frame in the display order of the current frame, respectively.
  • the motion estimation unit 42 may search for a reference block for the PU in the reference image of list 0 or list 1.
  • Motion estimation unit 42 may then generate a reference index indicating a reference picture containing the reference block in list 0 or list 1 and a motion vector indicating a spatial displacement between the PU and the reference block.
  • the motion estimation unit 42 may output a reference index, a prediction direction identifier, and a motion vector as motion information of the PU.
  • the prediction direction indicator may indicate that the reference index indicates the reference image in list 0 or list 1.
  • Motion compensation unit 44 may generate a predictive image block of the PU based on the reference block indicated by the motion information of the PU.
  • the motion estimation unit 42 may search for a reference block for the PU in the reference image in the list 0 and may also search for another one for the PU in the reference image in the list 1 Reference block. Motion estimation unit 42 may then generate a reference index indicating the reference picture containing the reference block in list 0 and list 1 and a motion vector indicating the spatial displacement between the reference block and the PU. The motion estimation unit 42 may output a reference index of the PU and a motion vector as motion information of the PU. Motion compensation unit 44 may generate a predictive image block of the PU based on the reference block indicated by the motion information of the PU.
  • motion estimation unit 42 does not output a complete set of motion information for the PU to entropy encoding module 56. Rather, motion estimation unit 42 may signal the motion information of the PU with reference to motion information of another PU. For example, motion estimation unit 42 may determine that the motion information of the PU is sufficiently similar to the motion information of the neighboring PU. In this embodiment, motion estimation unit 42 may indicate an indication value in a syntax structure associated with the PU that indicates to video decoder 30 that the PU has the same motion information as the neighboring PU or has a slave phase The motion information derived by the neighboring PU.
  • motion estimation unit 42 may identify candidates and motion vector differences (MVDs) associated with neighboring PUs in a syntax structure associated with the PU.
  • the MVD indicates the difference between the motion vector of the PU and the indicated candidate associated with the neighboring PU.
  • Video decoder 30 may use the indicated candidate and MVD to determine the motion vector of the PU.
  • prediction unit 41 may generate a candidate list for each PU of the CU.
  • One or more of the candidate lists may include one or more sets of original candidate motion information and one or more sets of additional candidate motion information derived from the original candidate motion information.
  • Intra prediction unit 46 within prediction unit 41 may perform intra-predictive decoding of the current video block relative to one or more neighboring blocks in the same image or slice as the current block to be decoded to provide spatial compression .
  • intra-prediction unit 46 may intra-predict the current block.
  • intra prediction unit 46 may determine an intra prediction mode to encode the current block.
  • intra-prediction unit 46 may encode the current block using various intra-prediction modes, for example, during separate encoding traversal, and intra-prediction unit 46 (or in some possible implementations, The mode selection unit 40) may select the appropriate intra prediction mode to use from the tested mode.
  • the video encoder 20 forms a residual video block by subtracting the prediction block from the current video block.
  • the residual video data in the residual block may be included in one or more TUs and applied to transform processing unit 52.
  • Transform processing unit 52 transforms the residual video data into residual transform coefficients using, for example, a discrete cosine transform (DCT) or a conceptually similar transformed transform (eg, a discrete sinusoidal transform DST).
  • Transform processing unit 52 may convert the residual video data from the pixel domain to a transform domain (eg, a frequency domain).
  • Transform processing unit 52 may send the resulting transform coefficients to quantization unit 54.
  • Quantization unit 54 quantizes the transform coefficients to further reduce the code rate. The quantization process can reduce the bit depth associated with some or all of the coefficients. The degree of quantization can be modified by adjusting the quantization parameters. In some possible implementations, quantization unit 54 may then perform a scan of the matrix containing the quantized transform coefficients. Alternatively, entropy encoding unit 56 may perform a scan.
  • entropy encoding unit 56 may entropy encode the quantized transform coefficients. For example, entropy encoding unit 56 may perform context adaptive variable length decoding (CAVLC), context adaptive binary arithmetic decoding (CABAC), syntax based context adaptive binary arithmetic decoding (SBAC), probability interval partition entropy (PIPE) decoding or another entropy coding method or technique. Entropy encoding unit 56 may also entropy encode the motion vectors and other syntax elements of the current video strip being decoded. After entropy encoding by entropy encoding unit 56, the encoded code stream may be transmitted to video decoder 30 or archive for later transmission or retrieved by video decoder 30.
  • CAVLC context adaptive variable length decoding
  • CABAC context adaptive binary arithmetic decoding
  • SBAC syntax based context adaptive binary arithmetic decoding
  • PIPE probability interval partition entropy
  • Entropy encoding unit 56 may also entropy
  • Entropy encoding unit 56 may encode information indicative of a selected intra prediction mode in accordance with the techniques of the present application.
  • Video encoder 20 may include encoding of various blocks in transmitted code stream configuration data that may include multiple intra prediction mode index tables and a plurality of modified intra prediction mode index tables (also referred to as codeword mapping tables) A definition of the context and an indication of the MPM, the intra prediction mode index table, and the modified intra prediction mode index table for each of the contexts.
  • Inverse quantization unit 58 and inverse transform unit 60 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain for later use as a reference block for the reference image.
  • Motion compensation unit 44 may calculate the reference block by adding the residual block to a prediction block of one of the reference pictures within one of the reference picture lists. Motion compensation unit 44 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for motion estimation.
  • Summer 62 adds the reconstructed residual block to the motion compensated prediction block generated by motion compensation unit 44 to produce a reference block for storage in reference image memory 64.
  • the reference block may be used by motion estimation unit 42 and motion compensation unit 44 as reference blocks to inter-predict subsequent video frames or blocks in the image.
  • video encoder 20 may directly quantize the residual signal without the need for processing by transform unit 52, and accordingly need not be processed by inverse transform unit 60; or, for some image blocks Or the image frame, the video encoder 20 does not generate residual data, and accordingly does not need to be processed by the transform unit 52, the quantization unit 54, the inverse quantization unit 58, and the inverse transform unit 60; or, the quantization unit 54 and the inverse of the video encoder 20 Quantization units 58 can be combined together.
  • FIG. 3 is a schematic block diagram of a video decoder 30 in the embodiment of the present application.
  • video decoder 30 includes an entropy encoding unit 80, a prediction unit 81, an inverse quantization unit 86, an inverse transform unit 88, a summer 90, and a reference image memory 92.
  • the reference image memory 92 can also be placed outside of the video decoder 30.
  • the prediction unit 81 includes an inter prediction unit (not shown) and an intra prediction unit 84.
  • the inter prediction unit may be, for example, a motion compensation unit 82.
  • video decoder 30 may perform an exemplary reciprocal decoding process with respect to the encoding flow described by video encoder 20 from FIG.
  • video decoder 30 receives from video encoder 20 an encoded video bitstream representing the video blocks of the encoded video slice and associated syntax elements.
  • Entropy encoding unit 80 of video decoder 30 entropy decodes the code stream to produce quantized coefficients, motion vectors, and other syntax elements.
  • the entropy encoding unit 80 forwards the motion vectors and other syntax elements to the prediction unit 81.
  • Video decoder 30 may receive syntax elements at the video stripe level and/or video block level.
  • intra-prediction unit 84 of prediction unit 81 may be based on the signaled intra prediction mode and data from the previously decoded block of the current frame or image. The predicted data of the video block of the current video stripe is generated.
  • motion compensation unit 82 of prediction unit 81 When the video image is decoded into an inter-frame decoded (eg, B, P, or GPB) stripe, motion compensation unit 82 of prediction unit 81 generates a current video based on the motion vectors and other syntax elements received from entropy encoding unit 80.
  • the prediction block may be generated from one of the reference images within one of the reference image lists.
  • Video decoder 30 may construct a reference image list (List 0 and List 1) using default construction techniques based on reference images stored in reference image memory 92.
  • Motion compensation unit 82 determines the prediction information for the video block of the current video slice by parsing the motion vector and other syntax elements, and uses the prediction information to generate a prediction block of the current video block that is being decoded. For example, motion compensation unit 82 uses some of the received syntax elements to determine a prediction mode (eg, intra prediction or inter prediction) of the video block used to decode the video slice, an inter prediction strip type (eg, B strip, P strip or GPB strip), strip reference Construction information of one or more of the image lists, motion vectors of each inter-coded video block of the stripe, inter-prediction status of each inter-coded video block of the stripe, and decoding of the current video stripe Additional information for the video block in .
  • a prediction mode eg, intra prediction or inter prediction
  • an inter prediction strip type eg, B strip, P strip or GPB strip
  • strip reference Construction information of one or more of the image lists
  • motion vectors of each inter-coded video block of the stripe inter-prediction status of each
  • Motion compensation unit 82 may also perform interpolation based on the interpolation filter. Motion compensation unit 82 may use the interpolation filters as used by video encoder 20 during encoding of the video block to calculate interpolated values for sub-integer pixels of the reference block. In this application, motion compensation unit 82 may determine the interpolation filters used by video encoder 20 from the received syntax elements and use an interpolation filter to generate the prediction blocks.
  • motion compensation unit 82 may generate a candidate list for the PU. Data identifying the location of the selected candidate in the candidate list of the PU may be included in the code stream. After generating the candidate list for the PU, motion compensation unit 82 may generate a predictive image block for the PU based on one or more reference blocks indicated by the motion information of the PU. The reference block of the PU may be in a different time image than the PU. Motion compensation unit 82 may determine motion information for the PU based on the selected motion information from the candidate list of PUs.
  • Inverse quantization unit 86 inverse quantizes (eg, dequantizes) the quantized transform coefficients provided in the codestream and decoded by entropy encoding unit 80.
  • the inverse quantization process may include determining the degree of quantization using the quantization parameters calculated by video encoder 20 for each of the video slices, and likewise determining the degree of inverse quantization that should be applied.
  • Inverse transform unit 88 applies an inverse transform (eg, an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process) to the transform coefficients to produce a residual block in the pixel domain.
  • video decoder 30 sums the residual block from inverse transform unit 88 with the corresponding prediction block generated by motion compensation unit 82.
  • a decoded video block is formed.
  • Summer 90 ie, the reconstructor
  • a deblocking filter can also be applied to filter the decoded blocks to remove blockiness artifacts as needed.
  • Other loop filters can also be used to smooth pixel transitions or otherwise improve video quality.
  • the decoded video block in a given frame or image is then stored in a reference image memory 92, which stores a reference image for subsequent motion compensation.
  • the reference image memory 92 also stores decoded video for later presentation on a display device such as display device 32 of FIG.
  • the techniques of the present application illustratively relate to inter-frame decoding. It should be understood that the techniques of the present application can be performed by any of the video decoders described in this application, including, for example, video encoder 20 and video decoding as shown and described with respect to Figures 1 through 3 30. That is, in one possible implementation, the prediction unit 41 described with respect to FIG. 2 may perform the specific techniques described below when performing inter prediction during encoding of blocks of video data. In another possible implementation, the prediction unit 81 described with respect to FIG. 3 may perform the specific techniques described below when performing inter prediction during decoding of blocks of video data. Thus, references to a generic "video encoder" or "video decoder” may include video encoder 20, video decoder 30, or another video encoding or encoding unit.
  • video decoder 30 may be used to decode the encoded video bitstream. For example, for certain image blocks or image frames, entropy decoding unit 80 of video decoder 30 does not decode the quantized coefficients, and accordingly does not need to be processed by inverse quantization unit 86 and inverse transform unit 88.
  • FIG. 4 is an exemplary flowchart of encoding motion information of a current image block (eg, a current PU or a current CU) by a video encoder (eg, video encoder 20) performing a merge operation 200 in an embodiment of the present application.
  • the video encoder may perform a merge operation other than the merge operation 200.
  • the video encoder may perform a merge operation in which the video encoder performs more than 200 steps, or steps different from the merge operation 200, than the merge operation.
  • the video encoder may perform the steps of the merge operation 200 in a different order or in parallel.
  • the encoder may also perform a merge operation 200 on the PU encoded in a skip mode.
  • the video encoder may generate a candidate list for the current PU (202).
  • the video encoder can generate a candidate list for the current PU in various ways. For example, the video encoder may generate a candidate list for the current PU according to one of the example techniques described below with respect to FIGS. 6A, 6B-10.
  • the candidate list for the current PU may include temporal candidate motion information (referred to as a temporal candidate).
  • the temporal candidate motion information may indicate motion information of a time-domain co-located PU.
  • the co-located PU may be spatially co-located with the current PU at the same location in the image frame, but in the reference image rather than the current image.
  • the present application may refer to a reference image including a PU corresponding to a time domain as a related reference image.
  • the present application may refer to a reference image index of an associated reference image as a related reference image index.
  • the current image may be associated with one or more reference image lists (eg, list 0, list 1, etc.).
  • the reference image index may indicate the reference image by indicating the position of the reference image in a certain reference image list.
  • the current image can be associated with a combined reference image list.
  • the associated reference image index is a reference image index of the PU that encompasses the reference index source location associated with the current PU.
  • the reference index source location associated with the current PU is adjacent to or adjacent to the current PU.
  • a PU may "cover" the particular location if the image block associated with the PU includes a particular location.
  • the reference index source location associated with the current PU is within the current CU.
  • the PU if the PU is above or to the left of the current CU, the PU that covers the reference index source location associated with the current PU may be considered available.
  • the video encoder may need to access motion information of another PU of the current CU in order to determine a reference image containing the co-located PU. Accordingly, these video encoders may use motion information (ie, reference image index) of PUs belonging to the current CU to generate temporal candidates for the current PU. In other words, these video encoders can generate temporal candidates using motion information for PUs belonging to the current CU. Accordingly, the video encoder cannot generate a candidate list for the current PU and the PU that covers the reference index source location associated with the current PU in parallel.
  • a video encoder can explicitly set an associated reference image index without reference to a reference image index of any other PU. This may enable the video encoder to generate candidate lists for other PUs of the current PU and the current CU in parallel. Because the video encoder explicitly sets the relevant reference picture index, the associated reference picture index is not based on motion information of any other PU of the current CU. In some possible implementations in which the video encoder explicitly sets the relevant reference image index, the video encoder may always set the relevant reference image index to a fixed predefined preset reference image index (eg, 0). In this way, the video encoder may generate a temporal candidate based on the motion information of the co-located PU in the reference frame indicated by the preset reference image index, and may include the temporal candidate in the candidate list of the current CU.
  • a fixed predefined preset reference image index eg, 0
  • the video encoder can be explicitly used in a syntax structure (eg, an image header, a stripe header, an APS, or another syntax structure) Signals the relevant reference image index.
  • the video encoder can signal the decoder for each LCU (ie CTU), An associated reference image index of a CU, PU, TU, or other type of sub-block. For example, the video encoder may signal that the associated reference picture index for each PU of the CU is equal to "1.”
  • the associated reference image index can be set implicitly rather than explicitly.
  • the video encoder may generate motion information for the PU of the current CU using the motion information of the PU in the reference image indicated by the reference image index of the PU covering the location outside the current CU. A time candidate, even if these locations are not strictly adjacent to the current PU.
  • the video encoder may generate a predictive image block (204) associated with the candidate in the candidate list.
  • the video encoder may generate a prediction associated with the candidate by determining motion information of the current PU based on the motion information of the indicated candidate and then generating a predictive image block based on the one or more reference blocks indicated by the motion information of the current PU.
  • Sexual image block The video encoder may select one of the candidates from the candidate list (206).
  • the video encoder can select candidates in a variety of ways. For example, the video encoder may select one of the candidates based on a rate-distortion cost analysis for each of the predictive image blocks associated with the candidate.
  • the video encoder may output an index of the candidate (208).
  • the index may indicate the location of the selected candidate in the candidate list.
  • the index can be expressed as "merge_idx".
  • FIG 5 is an exemplary flow diagram of motion compensation performed by a video decoder (e.g., video decoder 30) in an embodiment of the present application.
  • a video decoder e.g., video decoder 30
  • the video decoder may receive an indication for the selected candidate for the current PU (222). For example, the video decoder may receive a candidate index indicating the location of the selected candidate within the current PU's candidate list.
  • the video decoder may receive the first candidate index and the second candidate index.
  • the first candidate index indicates the location of the selected candidate for the list 0 motion vector of the current PU in the candidate list.
  • the second candidate index indicates the location of the selected candidate for the list 1 motion vector for the current PU in the candidate list.
  • a single syntax element can be used to identify two candidate indices.
  • the video decoder can generate a candidate list for the current PU (224).
  • the video decoder can generate this candidate list for the current PU in various ways.
  • the video decoder may use the techniques described below with reference to Figures 6A, 6B-10 to generate a candidate list for the current PU.
  • the video decoder may explicitly or implicitly set a reference image index identifying the reference image including the co-located PU, as previously described with respect to FIG. .
  • the video decoder may determine motion information for the current PU based on the motion information indicated by the one or more selected candidates in the candidate list for the current PU (225). For example, if the motion information of the current PU is encoded using the merge mode, the motion information of the current PU may be the same as the motion information indicated by the selected candidate. If the motion information of the current PU is encoded using the AMVP mode, the video decoder may reconstruct using one or more motion vectors indicated by the or the selected candidate and one or more MVDs indicated in the code stream One or more motion vectors of the current PU.
  • the reference image index and the prediction direction identifier of the current PU may be the same as the reference image index and the prediction direction identifier of the one or more selected candidates.
  • the video decoder may generate a predictive image block for the current PU based on one or more reference blocks indicated by the motion information of the current PU (226).
  • the video decoder generates a candidate list (224) for the current PU, and once the number of available candidates collected is determined by the received candidate index. When the number of candidates is the same, the process of collecting candidates can be ended.
  • FIG. 6A is an exemplary schematic diagram of a coding unit (CU), a spatial neighboring image block associated therewith, and a time domain neighboring image block in the embodiment of the present application, illustrating that the CU 600 and the exemplary candidate location associated with the CU 600 are 1 Schematic diagram of 10.
  • Candidate positions 1 through 5 represent spatial candidates in the same image as CU 600.
  • Candidate position 1 is located to the left of CU600.
  • Candidate position 2 is located above CU600.
  • the candidate position 3 is positioned at the upper right of the CU600.
  • the candidate position 4 is located at the lower left of the CU600.
  • the candidate position 5 is positioned at the upper left of the CU600.
  • Candidate locations 6 through 10 represent temporal candidates associated with co-located block 602 of CU 600, where the co-located block is of the same size, shape, and coordinates as CU 600 in the reference image (ie, adjacent to the encoded image) Image block.
  • the candidate location 6 is located in the lower right corner of the co-located block 602.
  • the candidate location 7 is located at the lower right middle of the co-located block 602.
  • the candidate location 8 is located at the upper left corner of the co-located block 602.
  • the candidate location 9 is located at the lower right corner of the co-located block 602.
  • the candidate location 10 is located at the upper left middle position of the co-located block 602.
  • FIG. 6A is an illustrative implementation to provide a candidate location for an inter prediction module (eg, motion estimation unit 42 or motion compensation unit 82 in particular) to generate a candidate list.
  • an inter prediction module eg, motion estimation unit 42 or motion compensation unit 82 in particular
  • the spatial candidate location and the temporal candidate location in FIG. 6A are merely illustrative, and the candidate location includes but is not limited thereto.
  • the spatial candidate location may also optionally include a location within a preset distance from the image block to be processed, but not adjacent to the image block to be processed.
  • this type of location can be as shown by 6 to 27 in Figure 6B.
  • FIG. 6B is an exemplary schematic diagram of a coding unit and a spatial neighboring image block associated therewith in the embodiment of the present application.
  • the position of the image block not adjacent to the image block to be processed that has been reconstructed when the image block to be processed is in the same image frame as the image block to be processed also belongs to the range of the spatial candidate position.
  • This type of location is referred to herein as a spatial non-contiguous image block, it being understood that the spatial candidate may be taken from one or more locations as shown in Figure 6B.
  • FIG. 7 is a schematic flowchart showing an acquisition process 700 of candidate motion information of an image block according to an embodiment of the present application.
  • Process 700 may be performed by video encoder 20 or video decoder 30, and in particular, may be performed by an inter prediction unit of video encoder 20 or an inter prediction unit of video decoder 30.
  • the inter prediction unit is illustrative and may include motion estimation unit 42 and motion compensation unit 44.
  • the inter prediction unit is illustrative and may include motion compensation unit 82.
  • the inter prediction unit may generate a candidate motion information list for the PU.
  • the candidate motion information list may include one or more original candidate motion information and one or more additional candidate motion information derived from the original candidate motion information.
  • process 700 can include acquisition process 710 of original candidate motion information and acquisition process 730 of additional/additional candidate motion information, which is described as a series of steps or operations, it being understood that process 700 can be in various orders Execution and/or simultaneous occurrence are not limited to the execution sequence shown in FIG. Assuming that a video data stream having multiple video frames is using a video encoder or video decoder, a process 700 comprising the steps of predicting candidate motion information for a current image block of a current video frame is performed;
  • Step 711 Detect one or more spatial reference blocks of the current image block according to the first preset sequence, and obtain M sets of original candidate motion information in the candidate list of the to-be-processed image block (or obtain for constructing The M sets of original candidate motion information of the candidate list of the image block to be processed, M is an integer greater than or equal to 0;
  • the detection herein may include an "available” inspection process as referred to elsewhere herein, or the detection herein may include “available” inspections as described elsewhere herein, as well as trimming (eg, de-redundancy) processes. ,No longer.
  • one or more spatial reference blocks of the current image block include: one or more spatial reference blocks in the image in which the current image block is located adjacent to the current image block, and/or the current image One or more spatial reference blocks in the image in which the block is located that are not adjacent to the image block to be processed. As shown in FIG.
  • the one or more spatial reference blocks adjacent to the current image block in the image of the current image block may include: a fourth airspace adjacent block A0 located at the lower left side of the current image block, located at a first airspace neighboring block A1 on the left side of the current image block, a third airspace neighboring block B0 located on the upper right side of the current image block, a second airspace adjacent block B1 on the upper side of the current image block, or located in the The fifth airspace on the upper left side of the current image block is adjacent to the block B2. As shown in FIG.
  • the one or more spatial reference blocks that are not adjacent to the image block to be processed in the image of the current image block may include: a first spatial non-contiguous image block and a second spatial non-contiguous image block. And a third airspace non-contiguous image block or the like.
  • step 711 the first airspace neighboring block A1, the second airspace neighboring block B1, the third airspace neighboring block B0, the fourth airspace neighboring block A0, and the fifth airspace neighboring block are sequentially detected.
  • B2 is available to obtain M1 determined motion vector images in the first airspace neighboring block A1, the second airspace neighboring block B1, the third airspace neighboring block B0, the fourth airspace neighboring block A0, and the fifth airspace neighboring block B2
  • the motion information of the block, M1 is an integer greater than or equal to 0,
  • the detection condition of the fifth airspace neighboring block B2 includes: when any one of the first airspace neighboring block A1, the second airspace neighboring block B1, the third airspace neighboring block B0, and the fourth airspace neighboring block A0 is unavailable, The fifth airspace neighboring block B2 is detected.
  • step 711 the first spatial non-contiguous image block, the second spatial non-contiguous image block, and the third spatial non-contiguous image block may be used.
  • the physical meaning of "available” may refer to the foregoing. The description will not be repeated.
  • the motion vector of the first spatial neighboring block A1, the motion vector of the second spatial neighboring block B1, the motion vector of the third spatial neighboring block B0, the motion vector of the fourth spatial neighboring block A0, the motion vector obtained by the ATMVP technique is respectively MVL, MVU, MVUR, MVDL, MVA, MVUL, MVS, and the first spatial non-contiguous image block, the second spatial non-contiguous image block, and
  • the motion vectors of the third spatial non-contiguous image block are respectively MV0, MV1, MV2, and then may be checked in the following order to obtain M candidates (ie, M candidate motion vectors) used in the construction candidate list:
  • Example 1 MVL, MVU, MVUR, MVDL, MV0, MV1, MV2, MVA, MVUL, MVS;
  • Example 2 MVL, MVU, MVUR, MVDL, MVA, MV0, MV1, MV2, MVUL, MVS;
  • Example 3 MVL, MVU, MVUR, MVDL, MVA, MVUL, MV0, MV1, MV2, MVS;
  • Example 4 MVL, MVU, MVUR, MVDL, MVA, MVUL, MVS, MV0, MV1, MV2;
  • Example 5 MVL, MVU, MVUR, MVDL, MVA, MV0, MVUL, MV1, MVS, MV2;
  • Example 6 MVL, MVU, MVUR, MVDL, MVA, MV0, MVUL, MV1, MV2, MVS;
  • Example 7 MVL, MVU, MVUR, MVDL, MVA, MVUL, MV0, MV1, MV2, MVS;
  • Examples 1 through 7 exemplarily show several possible M original candidate motion vectors for constructing a candidate list. Based on the motion vector of the spatial non-contiguous image block, there may be other ways of composing the candidate list and the arrangement of the candidates in the list, which are not limited.
  • motion vectors for example, MV0, MV1, and MV2
  • MV0, MV1, and MV2 may also have different arrangements, which is not limited in this embodiment of the present application.
  • the motion vector of the spatial non-contiguous image block is simultaneously used as the spatial candidate in the candidate list of the to-be-processed block, and more spatial a priori coding information is utilized to improve The coding performance.
  • Step 713 Detect one or more time domain reference blocks of the current image block according to a second preset sequence, and obtain L sets of original candidate motion information in the candidate list of the to-be-processed image block (or obtain L sets of original candidate motion information for constructing a candidate list of the image block to be processed, L is an integer greater than or equal to 0;
  • one or more time domain reference blocks of a current image block may be understood as an image block in a co-located block of a current image block or a spatial neighboring block of a co-located block of a current image block, and may include, for example:
  • the lower right spatial domain of the co-located block of the current image block is adjacent to the block H, the upper left intermediate block C0 of the co-located block, the lower right intermediate block C3 of the co-located block, the same position An upper left block TL of the block, or a lower right block BR of the same location block, wherein the co-located block is an image block of the reference image having the same size, shape, and coordinates as the current image block.
  • step 713 the right lower airspace neighboring block H of the co-located block and the lower right intermediate block C3 of the co-located block are sequentially detected to obtain L1 determined motion vector images. Motion information of the block; or
  • L1 is equal to or greater than L
  • L2 is equal to or greater than L
  • L3 is equal to or greater than L
  • L1, L2, and L3 are all integers greater than or equal to zero.
  • motion information of the different time domain reference blocks may also have different arrangement manners, which is not limited by the embodiment of the present application.
  • the detecting condition of the other time domain reference block that is not adjacent to the block H of the lower right spatial domain of the co-located block may include: the right lower airspace neighboring block H of the co-located block is not available. Or, the number of candidate motion information in the candidate list is less than the target number.
  • Step 731 When the number of candidate motion information in the candidate list of the image block to be processed is smaller than the target number, the original candidate motion information (also referred to as bidirectional prediction) of at least one set of bidirectional prediction types included in the candidate list. Performing decomposition processing on the original candidate motion information of the encoding/decoding mode to obtain Q in the candidate list of the image block to be processed A newly constructed unidirectional prediction type candidate motion information (also referred to as candidate motion information of a unidirectional prediction encoding/decoding mode), and Q is an integer greater than or equal to zero.
  • the original candidate motion information of a set of bidirectional prediction types may include: motion information for a forward prediction direction and motion information for a backward prediction direction, where the motion information for the forward prediction direction includes the first a reference image list and a first reference image index corresponding to the first reference image list and a motion vector of the first reference image corresponding to the first reference image index; the motion information for the backward prediction direction includes the second a reference image list and a second reference image index corresponding to the second reference image list and a motion vector of the second reference image corresponding to the second reference image index;
  • the Q-group newly constructed unidirectional prediction type candidate motion information may include: a unidirectional prediction type is a set of motion information in a forward prediction direction and/or a unidirectional prediction type is a backward prediction direction.
  • Group motion information wherein the set of motion information of the forward prediction direction includes a first reference image list and a first reference image index corresponding to the first reference image list and a first reference corresponding to the first reference image index a motion vector of the image; the set of motion information of the backward prediction direction includes a second reference image list and a second reference image index corresponding to the second reference image list and a second reference corresponding to the second reference image index The motion vector of the image. It should be understood that if the newly constructed candidate motion information is repeated with existing candidates in the candidate list, the newly constructed candidate motion information does not need to be added to the candidate list.
  • the method further includes:
  • Step 733 when the number of candidate motion information in the candidate list of the image block to be processed is smaller than the target number, the original of the two sets of unidirectional prediction types (unidirectional prediction encoding/decoding mode) included in the candidate list
  • the candidate motion information is combined to obtain candidate motion information (candidate motion information of the bidirectional prediction encoding/decoding mode) of the P group newly constructed in the candidate list of the to-be-processed image block, where P is greater than or equal to 0.
  • the integer is
  • the combination refers to motion information using a forward prediction encoding/decoding mode (ie, a set of unidirectional prediction type original candidate motion information) and motion information using a backward prediction encoding/decoding mode (ie, another set of unidirectional predictions)
  • the original candidate motion information of the type is combined to obtain motion information using a bidirectional predictive encoding/decoding mode (ie, a set of newly constructed bidirectional prediction type candidate motion information).
  • a motion information using a forward predictive encoding/decoding mode includes a reference image set of list0, a reference index of 1 reference image, and a motion vector of (-3, -5).
  • a motion information using a backward predictive encoding/decoding mode includes a reference image set of list1, a reference index of reference index 0, and a motion vector of (3, 5).
  • the motion information of the combined bidirectional prediction encoding/decoding mode includes: the forward prediction motion information is a reference image set as list0, the reference index is 1 reference image, and the motion vector is (-3, -5); The predicted motion information is a reference image set as list1, a reference index is 0, and the motion vector is (3, 5).
  • the decomposition is a combined inverse process, which refers to splitting motion information using a bidirectional predictive coding/code mode (ie, a set of bidirectional prediction type original candidate motion information) into motion information using a backward predictive encoding/decoding mode ( That is, a set of newly constructed unidirectional prediction type candidate motion information) and a motion information using a forward prediction encoding/decoding mode (ie, another set of newly constructed unidirectional prediction type candidate motion information).
  • a bidirectional predictive coding/code mode ie, a set of bidirectional prediction type original candidate motion information
  • a backward predictive encoding/decoding mode That is, a set of newly constructed unidirectional prediction type candidate motion information
  • a forward prediction encoding/decoding mode ie, another set of newly constructed unidirectional prediction type candidate motion information
  • the motion information in the bidirectional predictive encoding/decoding mode includes: the forward predicted motion information is a reference image set as list0, the reference index is 1 reference image, the motion vector is (-3, -5), and the backward predictive motion is The information is reference image set as list1, reference index is 0 reference image, motion vector For (3,5).
  • motion information using a forward prediction encoding/decoding mode can be respectively obtained, wherein the motion information is a reference image set list0, a reference index is 1 reference image, a motion vector is (-3, -5);
  • the motion information of the backward prediction encoding/decoding mode is adopted, wherein the motion information is a reference image set as list1, a reference index is 0, and the motion vector is (3, 5).
  • the embodiment of the present invention may further include:
  • Step 735 when the number of candidate motion information in the candidate list of the image block to be processed is smaller than the target number, for example, if the additional candidate generated by the foregoing manner is still insufficient, the video encoder or the video decoder may also insert zero.
  • Motion vectors are used as candidate motion information to generate additional or additional candidates. These additional or additional candidate motion information are not considered raw candidate motion information and may be referred to as late or artificially generated candidate motion information in this application.
  • Candidate motion information (eg, candidate motion information of a unidirectional prediction type generated by decomposition, candidate motion information of a bidirectional prediction type combined) to obtain more available candidate motion information for constructing a candidate list, thereby
  • the number of candidates in the candidate list can satisfy the target number (for example, the preset maximum number of candidate motion information in the candidate list, or can be obtained by parsing from the code stream)
  • the index identifies the number of candidate motion information determined.
  • the method for acquiring the candidate motion information in this embodiment can be applied to the inter-frame codec process of the video image codec, thereby improving the coding performance.
  • FIG. 8 is another exemplary flowchart of a method for acquiring candidate motion information of an image block in an embodiment of the present application.
  • Process 800 may be performed by a video encoding end (e.g., video encoder 20) or a video decoding end (e.g., video decoder 30).
  • the schematic process of the video encoding end acquiring candidate motion information to construct a candidate list is as follows:
  • Steps 801 to 805 in the case of the merge mode, in the process of collecting candidate motion information input, performing motion information detection on neighboring neighboring blocks in the current coded block airspace, and if available, as candidate motion information;
  • Step 807 When the number of available candidate motion information does not reach the maximum value of the candidate motion information, whether the motion information on the time domain reference block is detected may be used as candidate motion information.
  • Step 809 when it can be used as candidate motion information, determine whether the number of available candidate motion information has reached the maximum value of the preset candidate motion information
  • Step 811 When the number of available candidate motion information does not reach the maximum value of the candidate motion information, the existing candidate motion information is used to construct the bidirectional prediction motion information, and whether the newly constructed bidirectional motion information can be used as the candidate motion information;
  • Steps 813 to 817 when the number of available candidate motion information does not reach the maximum value of the candidate motion information, constructing the unidirectional prediction motion information by using the candidate motion information that has been bidirectionally predicted, and determining whether the newly constructed unidirectional prediction motion information is Can be used as candidate motion information;
  • Step 825 when the number of available candidate motion information reaches the maximum value of the candidate motion information, the process of collecting the candidate motion information is ended.
  • Step 823 when the number of available candidate motion information does not reach the maximum value of the candidate motion information, the process of collecting the candidate motion information is continued.
  • the schematic process of the video decoder acquiring candidate motion information to construct a candidate list is as follows:
  • Steps 801 to 805 in the case of the merge mode, in the process of collecting candidate motion information input, detecting motion information of neighboring neighboring blocks in the current coded block airspace;
  • If available, as candidate motion information compare the number of available candidate motion information with the target number determined by the index value received by the video decoder;
  • the motion information of the neighboring block in the current decoded block spatial domain is used as the best candidate motion information (ie, for the current image block to be decoded)
  • the selected candidate motion information also referred to as target candidate motion information
  • step 807 If the number of available candidate motion information is different from the target number determined by the video decoder, the step 807 is performed;
  • Steps 807 to 809 detecting whether motion information on the time domain reference block can be used as candidate motion information; if available, as candidate motion information, determining the number of available candidate motion information and the index value received by the video decoder The number of targets is compared;
  • the motion information is motion information of the image block to be decoded or motion information of the image block to be decoded by using motion information on the current time domain reference block, and the process of collecting the candidate motion information is ended;
  • step 811 If the number of available candidate motion information is different from the target number determined by the video decoder, continue to step 811;
  • Steps 811 to 815 combining the existing candidate motion information to construct bidirectional prediction motion information, and determining whether the newly constructed bidirectional prediction motion information is available as candidate motion information;
  • If available, as candidate motion information compare the number of available candidate motion information with the target number determined by the index value received by the video decoder;
  • determining that the currently constructed bidirectional prediction motion information is the best candidate motion information that is, determining the currently constructed bidirectional prediction motion information is to be determined. Decoding the motion information of the image block or determining the motion information of the image block to be decoded by using the currently constructed bidirectional prediction motion information, and ending the process of collecting the candidate motion information;
  • step 817 If the number of available candidate motion information and the index value are different from the target value determined by the video decoder, continue to step 817;
  • Steps 817 to 821 constructing unidirectional prediction motion information by using the existing bidirectionally predicted candidate motion information, and determining whether the newly constructed unidirectional prediction motion information is available as candidate motion information;
  • the newly constructed unidirectional prediction motion information can be used as candidate motion information, it is determined whether the number of available candidate motion information is consistent with the target number determined by the index value received by the video decoder;
  • determining the newly constructed unidirectional prediction motion information as the best candidate motion information that is, determining the newly constructed unidirectional prediction motion information Determining motion information of the image block to be decoded for motion information of the image block to be decoded or using newly constructed unidirectional prediction motion information, the process ends;
  • step 823 is performed.
  • Step 823 continuing to perform the process of collecting candidate motion information
  • Step 825 ending the process of collecting candidate motion information.
  • the candidate index received by the decoding end is "1"
  • it indicates that the selected candidate motion information for the current image block to be decoded is a candidate on the index position 0 in the composite candidate list
  • the index value obtained by decoding at the decoding end is 4, and in the process of acquiring candidate motion information in the synthesized Merge mode, the right lower airspace of the co-located block of the currently decoded block is adjacent to the block H of the right-hand domain.
  • the motion information may be used as the candidate motion information.
  • the number of available candidate motion information is 3, and the number of targets derived from the index value is different, and the motion information of the intermediate block C0 or C3 of the same location block may not be determined as available.
  • the number of candidate motion information available at this time is 3, which is different from the number of targets derived from the index value.
  • the number of available candidate motion information at this time is 4, which is the same as the number of targets derived from the index value. .
  • the motion information of the constructed unidirectional prediction type is selected as the best candidate motion information (ie, the selected candidate motion information for the current image block to be decoded), and the process of acquiring the candidate motion information is ended.
  • the number of candidate motion information in the candidate list to be constructed is obtained according to the index value.
  • the number of candidate motion information acquired is sufficient to determine the target candidate motion information by using the index value, that is, the target candidate motion information in the candidate list.
  • the other candidate motion information in the candidate list is stopped.
  • the decoder to construct the candidate list, one is to detect one side matching as described above, and the other is that after the candidate list is all constructed, it is matched with the index value to determine which candidate to select.
  • the motion information that is, the candidate motion vector at the position indicated by the index is found from the established candidate list.
  • the foregoing candidate list may be used in the Merge mode described above, or in other prediction modes for acquiring a predicted motion vector of a to-be-processed image block, and may be used in the encoding end, or may be consistent with the corresponding encoding end.
  • the number of candidates in the candidate list is also the preset maximum number, and is consistent at the codec end, and the specific number is not limited. In this case, the operation of the decoding end refers to the decoding end, here No longer.
  • FIG. 9 is an exemplary schematic diagram of adding a decomposed candidate motion vector to a merge mode candidate list in the embodiment of the present application.
  • a merging prediction type of merging candidate is generated by decomposing the original merging candidate of the bidirectional prediction type.
  • one of the original candidates of the bi-prediction type (which has mvL0 and refIdxL0, and mvL1 and refIdxL1) can be used to generate two unidirectional predictive merge candidates.
  • a raw merge candidate of one bidirectional prediction type (having mvL0_A and ref0 in list 0 and mvL1_B and ref0 in list 1) is included in the original merge candidate list at index position 0.
  • a newly constructed unidirectional prediction type candidate ie, the prediction type is list 0 unidirectional prediction, and mvL0_A and ref0 are picked up from list 0.
  • Another newly constructed unidirectional prediction type candidate the prediction type is list 1 unidirectional prediction, and mvL1_B and ref0 are picked up from list 1. It is checked whether the newly constructed merge candidate is different from the candidate already included in the merge candidate list. If they are different, the video decoder or video encoder includes the newly constructed unidirectional prediction type of merge candidate in the merge candidate list.
  • FIG. 10 is an exemplary schematic diagram of adding a combined candidate motion vector to a merge mode candidate list in the embodiment of the present application.
  • the combined bi-predictive merge candidate can be generated by combining the original merge candidates.
  • two of the original candidates (which have mvL0 and refIdxL0 or mvL1 and refIdxL1) may be used to generate bi-predictive merge candidates.
  • two candidates are included in the original merge candidate list.
  • the prediction type of one candidate is list 0 unidirectional prediction
  • the prediction type of another candidate is list 1 unidirectional prediction.
  • mvL0_A and ref0 are picked up from list 0
  • mvL1_B and ref0 are picked up from list 1
  • bidirectional predictive merge candidates which have mvL0_A and ref0 in list 0 and list 1
  • mvL1_B and ref0 the video decoder may include bi-predictive merge candidates in the candidate list.
  • the generated candidate is added to the merge candidate list.
  • the process of determining whether a candidate is different from a candidate already included in the candidate list is sometimes referred to as pruning.
  • each newly generated candidate can be compared to an existing candidate in the list.
  • the pruning operation may include comparing one or more new candidates to candidates that are already in the candidate list and new candidates that are not added as duplicates of candidates already in the candidate list.
  • the pruning operation can include adding one or more new candidates to the candidate list and later removing the duplicate candidates from the list.
  • FIG. 11 is a schematic block diagram of an apparatus 1100 for acquiring candidate motion information of an image block in an embodiment of the present application.
  • the candidate motion information is used to construct a candidate list for inter prediction, and the apparatus 1100 for acquiring candidate motion information of the image block includes:
  • the airspace candidate motion information acquiring module 1101 is configured to detect one or more spatial reference blocks of the current image block according to the first preset sequence, and obtain M sets of original candidate motion information in the candidate list of the to-be-processed image block.
  • M is an integer greater than or equal to 0;
  • the time domain candidate motion information acquiring module 1102 is configured to detect one or more time domain reference blocks of the current image block according to a second preset sequence, to obtain an L group in the candidate list of the to-be-processed image block.
  • Raw candidate motion information, L is an integer greater than or equal to 0;
  • An additional candidate motion information acquiring module 1103, configured to: when the number of candidate motion information in the candidate list of the image block to be processed is smaller than a target number, the original candidate motion of at least one set of bidirectional prediction types included in the candidate list The information is subjected to decomposition processing to obtain candidate motion information of the unidirectional prediction type newly constructed by the Q group in the candidate list of the image block to be processed, and Q is an integer greater than or equal to 0.
  • the original candidate motion information of the set of bidirectional prediction types includes: motion information for a forward prediction direction and motion information for a backward prediction direction, where the The motion information of the prediction direction includes a first reference image list and a first reference image index corresponding to the first reference image list and a motion vector of the first reference image corresponding to the first reference image index;
  • the motion information of the prediction direction includes a second reference image list and a second reference image index corresponding to the second reference image list and a motion vector of the second reference image corresponding to the second reference image index;
  • the candidate motion information of the unidirectional prediction type newly constructed by the Q group includes: a group of motion information of a unidirectional prediction type being a forward prediction encoding/decoding mode and/or
  • the unidirectional prediction type is a set of motion information of a backward prediction encoding/decoding mode, wherein the set of motion information of the forward prediction encoding/decoding mode includes a first reference image list and a first corresponding to the first reference image list a reference image index and a motion vector of the first reference image corresponding to the first reference image index; the set of motion information of the backward predictive encoding/decoding mode includes a second reference image list and corresponding to the second reference image list And a second reference image index and a motion vector of the second reference image corresponding to the second reference image index.
  • the additional candidate motion information acquiring module is further configured to: include in the candidate list
  • the original candidate motion information of the two sets of unidirectional prediction types is combined to obtain candidate motion information of the P-group newly constructed bidirectional prediction type in the candidate list of the image block to be processed, and P is an integer greater than or equal to 0.
  • the one or more spatial reference blocks include: one or more spatial reference blocks in the image in which the current image block is located adjacent to the current image block, and/or the current image One or more spatial reference blocks in the image in which the block is located that are not adjacent to the image block to be processed.
  • the one or more spatial reference blocks in the image in which the current image block is located adjacent to the current image block include:
  • a fourth airspace neighboring block A0 located at a lower left side of the current image block, a first airspace neighboring block A1 located at a left side of the current image block, and a third airspace neighboring block B0 located at an upper right side of the current image block.
  • the airspace candidate motion information acquiring module is configured to:
  • the detection condition of the fifth airspace neighboring block B2 includes: when any one of the first airspace neighboring block A1, the second airspace neighboring block B1, the third airspace neighboring block B0, and the fourth airspace neighboring block A0 is unavailable, The fifth airspace neighboring block B2 is detected.
  • the one or more time domain reference blocks include: a lower right spatial domain adjacent block H of a co-located block of the current image block, where the same location block An upper left intermediate block C0, a lower right intermediate block C3 of the same location block, an upper left block TL of the same location block, or a lower right block BR of the same location block, wherein the same location block is in a reference image An image block having the same size, shape, and coordinates as the current image block.
  • the time domain candidate motion information acquiring module is configured to:
  • L1 is equal to or greater than L
  • L2 is equal to or greater than L
  • L3 is equal to or greater than L
  • L1, L2, and L3 are all integers greater than or equal to zero.
  • the apparatus 1100 is configured to encode or decode a video image, where the target number is a preset maximum number of candidate motion information in a candidate list of the current image block; or, the device 1100 For decoding a video image, the target number is the number of candidate motion information determined using an index identifier parsed from the code stream.
  • candidate motion information eg, candidate motion information of a unidirectional prediction type generated by decomposition, candidate motion information of a bidirectional prediction type combined
  • target number for example, the preset maximum number of candidate motion information in the candidate list, or the number of candidate motion information determined using the index identification parsed from the code stream
  • the motion vector image block is determined to be an image block whose motion vector has been determined when predicting an image block to be processed, and may be an image block that has been reconstructed or an image block that has not been reconstructed. , no restrictions.
  • FIG. 12 is a schematic block diagram of an implementation manner of an encoding device or a decoding device (referred to as decoding device 1200 for short) in an embodiment of the present application.
  • the decoding device 1200 can include a processor 1210, a memory 1230, and a bus system 1250. Wherein the processor and the memory are connected by a bus system, the memory is used for storing instructions, and the processor is used for Execute the instructions stored in this memory.
  • the memory of the encoding device stores the program code, and the processor can invoke the program code stored in the memory to perform various video encoding or decoding methods described herein, particularly video encoding or decoding methods in various new inter prediction modes. And methods of predicting motion information in various new inter prediction modes. To avoid repetition, it will not be described in detail here.
  • the processor 1210 may be a central processing unit (“CPU"), and the processor 1210 may also be other general-purpose processors, digital signal processors (DSPs), and dedicated integration. Circuit (ASIC), off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, etc.
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • the memory 1230 can include a read only memory (ROM) device or a random access memory (RAM) device. Any other suitable type of storage device can also be used as the memory 1230.
  • Memory 1230 can include code and data 1231 that are accessed by processor 1210 using bus 1250.
  • the memory 1230 can further include an operating system 1233 and an application 1235 that includes a video encoding or decoding method (especially an acquisition method of candidate motion information for an image block described herein) that allows the processor 1210 to perform the methods described herein.
  • application 1235 can include applications 1 through N, which further include a video encoding or decoding application (referred to as a video coding application) that performs the video encoding or decoding methods described herein.
  • the bus system 1250 may include a power bus, a control bus, a status signal bus, and the like in addition to the data bus. However, for clarity of description, various buses are labeled as bus system 1250 in the figure.
  • decoding device 1200 may also include one or more output devices, such as display 1270.
  • display 1270 can be a tactile display that combines the display with a tactile unit that operatively senses a touch input.
  • Display 1270 can be coupled to processor 1210 via bus 1250.
  • the functions described can be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted as one or more instructions or code via a computer readable medium and executed by a hardware-based processing unit.
  • the computer readable medium can comprise a computer readable storage medium or communication medium, the computer readable storage medium corresponding to a tangible medium such as a data storage medium, the communication medium comprising facilitating transmission of the computer program, for example, from one location to another in accordance with a communication protocol Any media.
  • computer readable media may illustratively correspond to (1) a non-transitory tangible computer readable storage medium, or (2) a communication medium such as a signal or carrier wave.
  • Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this application.
  • the computer program product can comprise a computer readable medium.
  • the computer readable storage medium may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage device, magnetic disk storage device or other magnetic storage device, flash memory or may be used to store instructions. Or any other medium in the form of a data structure and accessible by a computer. Also, any connection is properly termed a computer-readable medium. For example, if you use coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technology such as infrared, radio, and microwave to transmit commands from a website, server, or other remote source, then coaxial Cables, fiber optic cables, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of the media.
  • coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of the media.
  • computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but instead are directed to non-transitory tangible storage media.
  • magnetic disks and optical disks include compact discs (CDs), laser discs, optical discs, digital versatile discs (DVDs), flexible disks, and Blu-ray discs, in which disks typically reproduce data magnetically, while discs pass through thunder. The projection optically reproduces the data. Combinations of the above should also be included in the scope of computer readable media.
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • FPGAs field programmable gate arrays
  • processors may refer to any of the foregoing structures or any other structure suitable for implementing the techniques described herein.
  • functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques can be fully implemented in one or more circuits or logic elements.
  • the techniques of the present application can be implemented in a wide variety of devices or devices, including wireless handsets, integrated circuits (ICs), or a collection of ICs (eg, a chipset).
  • ICs integrated circuits
  • a collection of ICs eg, a chipset.
  • Various components, modules or units are described herein to emphasize functional aspects of the apparatus configured to perform the disclosed techniques, but do not necessarily need to be implemented by different hardware units. Rather, as described above, various units may be combined in a codec hardware unit or combined with suitable software and/or firmware by interoperable hardware units (including one or more processors as described above). The collection comes to offer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

L'invention concerne un procédé d'obtention d'informations de mouvement candidates d'un bloc d'image. Les informations de mouvement candidates sont utilisées pour construire une liste de candidates de prédiction inter-trames. Le procédé comporte les étapes consistant à: détecter un ou plusieurs blocs de référence spatiale d'un bloc d'image courant selon une première séquence préétablie pour obtenir M ensembles d'informations de mouvement candidates d'origine dans une liste de candidates du bloc d'image courant; détecter un ou plusieurs blocs de référence temporelle du bloc d'image courant selon une seconde séquence préétablie pour obtenir L ensembles d'informations de mouvement candidates d'origine dans une liste de candidates d'un bloc d'image à traiter; et si la quantité d'informations de mouvement candidates dans la liste de candidates de l'image à traiter est inférieure à une quantité visée, décomposer au moins un ensemble d'informations de mouvement candidates d'origine du type de prédiction bidirectionnelle comprises dans la liste de candidates pour obtenir Q ensembles d'informations de mouvement candidates nouvellement construites du type de prédiction unidirectionnelle dans la liste de candidates du bloc d'image à traiter. La solution technique de la présente invention améliore la précision prédictive de vecteurs de mouvement du bloc d'image et accroît les performances de codage et de décodage.
PCT/CN2017/108611 2017-10-31 2017-10-31 Procédé et dispositif d'obtention d'informations de mouvement candidates d'un bloc d'image, et codec WO2019084776A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/108611 WO2019084776A1 (fr) 2017-10-31 2017-10-31 Procédé et dispositif d'obtention d'informations de mouvement candidates d'un bloc d'image, et codec

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/108611 WO2019084776A1 (fr) 2017-10-31 2017-10-31 Procédé et dispositif d'obtention d'informations de mouvement candidates d'un bloc d'image, et codec

Publications (1)

Publication Number Publication Date
WO2019084776A1 true WO2019084776A1 (fr) 2019-05-09

Family

ID=66331259

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/108611 WO2019084776A1 (fr) 2017-10-31 2017-10-31 Procédé et dispositif d'obtention d'informations de mouvement candidates d'un bloc d'image, et codec

Country Status (1)

Country Link
WO (1) WO2019084776A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210409686A1 (en) * 2019-03-11 2021-12-30 Hangzhou Hikvision Digital Technology Co., Ltd. Method for constructing motion information candidate list, method and apparatus for triangle prediction decoding

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070047649A1 (en) * 2005-08-30 2007-03-01 Sanyo Electric Co., Ltd. Method for coding with motion compensated prediction
CN103096082A (zh) * 2013-01-22 2013-05-08 清华大学 一种基于时域演变的双向运动估算方法
CN103338372A (zh) * 2013-06-15 2013-10-02 浙江大学 一种视频处理方法及装置
CN103765896A (zh) * 2011-06-27 2014-04-30 三星电子株式会社 用于对运动信息进行编码的方法和设备以及用于对运动信息进行解码的方法和设备
CN107113446A (zh) * 2014-12-09 2017-08-29 联发科技股份有限公司 视频编码中的运动矢量预测子或合并候选的推导方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070047649A1 (en) * 2005-08-30 2007-03-01 Sanyo Electric Co., Ltd. Method for coding with motion compensated prediction
CN103765896A (zh) * 2011-06-27 2014-04-30 三星电子株式会社 用于对运动信息进行编码的方法和设备以及用于对运动信息进行解码的方法和设备
CN103096082A (zh) * 2013-01-22 2013-05-08 清华大学 一种基于时域演变的双向运动估算方法
CN103338372A (zh) * 2013-06-15 2013-10-02 浙江大学 一种视频处理方法及装置
CN107113446A (zh) * 2014-12-09 2017-08-29 联发科技股份有限公司 视频编码中的运动矢量预测子或合并候选的推导方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210409686A1 (en) * 2019-03-11 2021-12-30 Hangzhou Hikvision Digital Technology Co., Ltd. Method for constructing motion information candidate list, method and apparatus for triangle prediction decoding
US11863714B2 (en) * 2019-03-11 2024-01-02 Hangzhou Hikvision Digital Technology Co., Ltd. Method for constructing motion information candidate list, method and apparatus for triangle prediction decoding

Similar Documents

Publication Publication Date Title
WO2019120305A1 (fr) Procédé de prédiction d'informations de mouvement d'un bloc d'image, dispositif et codec
JP7342115B2 (ja) 履歴ベースの動きベクトル予測子の改善
US10652571B2 (en) Advanced motion vector prediction speedups for video coding
CN109792531B (zh) 一种编解码视频数据的方法、设备及存储介质
KR102520296B1 (ko) 비디오 코딩에서의 모션 벡터 유도
CN107710764B (zh) 确定用于视频译码的照明补偿状态的系统及方法
CN107690810B (zh) 确定用于视频译码的照明补偿状态的系统及方法
US9736489B2 (en) Motion vector determination for video coding
CN109996081B (zh) 图像预测方法、装置以及编解码器
US9854234B2 (en) Reference picture status for video coding
KR20190055109A (ko) 비디오 코딩을 위한 적응적 모션 벡터 정밀도
JP2019530316A (ja) ビデオコード化のためのツリータイプコード化
US20130114717A1 (en) Generating additional merge candidates
JP6239609B2 (ja) ビデオコーディングのための長期参照ピクチャをシグナリングすること
JP2018530246A (ja) ビデオコーディングのために位置依存の予測組合せを使用する改善されたビデオイントラ予測
JP2015529065A (ja) スケーラブルビデオコーディングおよび3dビデオコーディングのための多重仮説動き補償
JP6271734B2 (ja) サブpuレベル高度残差予測
US11563949B2 (en) Motion vector obtaining method and apparatus, computer device, and storage medium
WO2019218286A1 (fr) Procédé et appareil de codage et de décodage vidéo
WO2019154424A1 (fr) Procédé de décodage vidéo, décodeur vidéo et dispositif électronique
WO2020047807A1 (fr) Procédé et appareil d'interprédiction et codec
US11394996B2 (en) Video coding method and apparatus
US11197018B2 (en) Inter-frame prediction method and apparatus
WO2020043111A1 (fr) Procédés de codage et de décodage d'image basés sur une liste de candidats historiques et codec correspondant
KR20220005550A (ko) 비디오 코딩을 위한 서브-블록 시간적 움직임 벡터 예측

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17930245

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17930245

Country of ref document: EP

Kind code of ref document: A1