WO2019052330A1

WO2019052330A1 - Encoding and decoding method and apparatus for motion information

Info

Publication number: WO2019052330A1
Application number: PCT/CN2018/102632
Authority: WO
Inventors: 张娜; 安基程; 林永兵; 郑建铧
Original assignee: 华为技术有限公司
Priority date: 2017-09-12
Filing date: 2018-08-28
Publication date: 2019-03-21
Also published as: CN109495738B; CN109495738A

Abstract

Provided are an encoding and decoding method and apparatus for predicted motion information about an image block. The decoding method comprises: parsing a code stream to obtain target identification information about target predicted motion information about an image block to be processed; determining N pieces of candidate predicted motion information, the N pieces of candidate predicted motion information comprising the target predicted motion information, where N is an integer greater than 1; acquiring a distortion value corresponding to each of the N pieces of candidate predicted motion information, the distortion value being determined by a reconstructed image block adjacent to a reference image block indicated by the candidate predicted motion information and a reconstructed image block adjacent to the image block to be processed; according to magnitude relationships of the acquired N distortion values, determining first identification information about each of the N pieces of candidate predicted motion information, the N pieces of candidate predicted motion information corresponding to respective pieces of first identification information on a one-to-one basis; and determining candidate predicted motion information corresponding to first identification information matching the target identification information to be the target predicted motion information.

Description

Method and device for encoding and decoding motion information

The present application claims priority to Chinese Patent Application No. 200910818690.5, filed on Sep. 12, 2017, the entire disclosure of which is incorporated herein by reference. In this application.

Technical field

The present application relates to the field of video image technology, and in particular, to a method and apparatus for encoding and decoding motion information.

Background technique

Digital video capabilities can be applied to many devices, including digital TV, digital live broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), notebook or desktop computers, tablet computers, e-book readers, digital cameras, digital Recording devices, digital media players, video game devices, video game consoles, cellular or satellite radio phones, video conferencing devices, video streaming devices, and the like. Digital video devices implement video compression techniques, such as MPEG-2, MPEG-4, ITU-TH.263, ITU-TH.264/MPEG-4, Part 10, advanced video coding (AVC), ITU- The TH.265 high efficiency video coding (HEVC) standard defines standards and those described in the extensions of the standard to more efficiently transmit and receive digital video information. Video devices may transmit, receive, encode, decode, and/or store digital video information more efficiently by implementing these video codec techniques.

Video compression techniques perform spatial (intra-image) prediction and/or temporal (inter-image) prediction to reduce or remove redundancy inherent in video sequences. For block-based video decoding, a video image may be partitioned into video blocks, which may also be referred to as treeblocks, coding units (CUs), decoding units, decoding nodes, and the like. A video block in an intra-coded (I) slice of an image is encoded using spatial prediction of reference samples in neighboring blocks in the same image. A video block in an inter-coded (P or B) strip of an image is encoded using spatial prediction of reference samples in neighboring blocks in the same image or temporal prediction of reference samples in other reference pictures. An image may be referred to as a frame, and a reference image may be referred to as a reference frame.

Summary of the invention

The present application introduces a prediction method. Specifically, when the to-be-processed block has a plurality of candidate prediction motion information, the similarity between the reference image blocks indicated by the candidate prediction motion information of the to-be-processed block and the to-be-processed block is A priori knowledge to assist in determining the coding mode of the identification of each candidate prediction motion information, thereby achieving the purpose of saving coding bits and improving coding efficiency. In a feasible implementation manner, since the pixel value of the to-be-processed block cannot be directly obtained at the decoding end, the similarity is similar to the reconstructed pixel set corresponding to the block to be processed and the reconstructed pixel set corresponding to the reference image block. The degree is approximated, that is, the similarity between the reconstructed pixel set around the block to be processed and the reconstructed pixel set corresponding to the reference image block is used to represent the reference image indicated by the candidate predicted motion information of the block to be processed and the block to be processed. The similarity between blocks.

It should be understood that, taking the coding end as an example, the embodiment of the present application is applicable to a scenario in which a reference image block is determined from a plurality of reference image blocks of a to-be-processed block, and the identification information of the reference image block is encoded. And the plurality of reference image blocks are derived from an inter-frame type prediction mode, from an intra-frame type prediction mode, or from a viewpoint-to-view mode (Multi-view or 3D Video Codig). It is also independent of the prediction mode (scalable video coding, Scalabe Video Coding) from the layer, regardless of the specific reference image block acquisition method (such as using ATMVP or STMVP, or intra block copy mode), and indicating the reference image block. Whether the motion information belongs to the motion vector of the entire coding unit or the motion information belonging to a certain sub-coding unit in the coding unit, the foregoing various prediction modes conforming to the applicable scene of the embodiment of the present application and the method for acquiring the reference image block (ie, acquiring motion) The method of the information can achieve the technical effect of improving the coding efficiency according to or in combination with the solution in the embodiment of the present application.

In a first aspect of the embodiments of the present application, an encoding method for image block prediction motion information is provided, including the steps of: acquiring N candidate prediction motion information of an image block to be processed. Where N is an integer greater than one. The N candidate predicted motion informations are different from each other. It should be understood that when the motion information includes the motion vector and the reference frame information, the motion information is different from each other, and also includes the case where the motion vectors are the same but the reference frame information is different. The technique of pruning has been introduced in the foregoing, and it should be understood that in the process of obtaining N candidate prediction motion information of an image block to be processed, a pruning operation is performed, so that the N candidate motion information obtained finally is different from each other. No longer. In a feasible implementation manner, the acquiring the N candidate predicted motion information of the to-be-processed image block includes: acquiring N mutually different image blocks having a preset position according to a preset sequence The motion information of the image block of the relationship is used as the N candidate prediction motion information. In a feasible implementation manner, the acquiring the N candidate predicted motion information of the to-be-processed image block includes: acquiring, in a preset order, M different preset positions and the image block to be processed having a preset position The motion information of the image block of the relationship is the M candidate prediction motion information, wherein the M candidate prediction motion information includes the N candidate prediction motion information, and M is an integer greater than N; determining the M candidate prediction motions a manner of grouping information; determining, according to the grouping manner, the N candidate predicted motion information from the M candidate predicted motion information. For each grouping, different processing methods can be adopted, or the same processing method can be adopted. In a possible implementation, the grouping method is programmed into a code stream. In a feasible implementation manner, the grouping mode is that the codec end is respectively solidified on the codec end and maintained consistent through a preset protocol. At the same time, the encoding end also needs to make the decoding end know the specific candidate prediction motion information. In a feasible implementation manner, the candidate prediction motion information is encoded into the code stream; or the second identification information indicating the image block having the preset positional relationship with the to-be-processed image block is encoded into the code stream; Or, the third identification information that has a preset correspondence relationship with the N candidate predicted motion information is encoded into the code stream. In a feasible implementation manner, the candidate prediction motion information is that the codec end is respectively solidified at the codec end and maintained consistent by a preset protocol.

The candidate prediction motion information is grouped, and different packets are allowed to adopt different processing manners, which makes the coding method more flexible and reduces the computational complexity.

The encoding method further includes the step of determining that adjacent reconstructed image blocks of the image block to be processed are available. In a feasible implementation manner, when the adjacent reconstructed image block of the to-be-processed image block includes at least two of the original adjacent reconstructed image blocks, determining the adjacent of the to-be-processed image block Reconstructing the image block is operative, comprising determining that at least one of the at least two of the original adjacent reconstructed image blocks is available. In some embodiments, the identification information is required to encode the auxiliary information such as the foregoing grouping manner and/or the processing manner of each packet. In such an embodiment, the adjacent reconstructed image block of the image block to be processed may also be determined first. Availability.

When adjacent reconstructed image blocks are not available, they can be directly encoded in a conventional manner without further encoding the above auxiliary information, thereby saving coded bits.

The encoding method further includes the step of: acquiring distortion values corresponding to the N candidate prediction motion information. The reference adjacent reconstructed image block is identical in shape and equal in size to the original adjacent reconstructed image block, and a positional relationship between the reference adjacent reconstructed image block and the reference image block is opposite to the original The positional relationship between the adjacent reconstructed image block and the image block to be processed is the same. The image block to be processed is a rectangle, and the image block to be processed has a width W and a height H, and the original adjacent reconstructed image block is a rectangle. In a feasible implementation manner, a lower boundary of the original adjacent reconstructed image block is adjacent to an upper boundary of the image block to be processed, and a width of the original adjacent reconstructed image block is W, and the height is In a feasible implementation manner, a lower boundary of the original adjacent reconstructed image block is adjacent to an upper boundary of the image block to be processed, and a width of the original adjacent reconstructed image block is W+ H, the height is n; in a feasible implementation manner, a right boundary of the original adjacent reconstructed image block is adjacent to a left boundary of the image block to be processed, and the original adjacent reconstructed image block is The width is n and the height is H. In a feasible implementation, the right boundary of the original adjacent reconstructed image block is adjacent to a left boundary of the image block to be processed, and the original adjacent reconstructed image The width of the block is n and the height is W+H. Where W, H, and n are positive integers.

In a feasible implementation manner, according to the requirement of the coding system for the line buffer, the above n may be set to 1 or 2, so that no additional storage space is needed to store the original adjacent reconstructed image block, which simplifies the hardware. achieve.

This step first needs to obtain a reference adjacent reconstructed image block of the reference image block of the image block to be processed indicated by the N candidate prediction motion information. In a feasible implementation manner, the motion vector in the candidate prediction motion information points to the position of the sub-pixel in the reference frame, and at this time, the pixel of the reference frame image or the reference frame image needs to be interpolated to obtain the image. Referring to the adjacent reconstructed image block of the reference image block, the 8-tap filter of {-1, 4, -11, 40, 40, -11, 4, -1} may be used for pixel interpolation, or Simplify the computational complexity, and you can also use bilinear interpolation filters for subpixel interpolation.

Using a simpler interpolation filter reduces the complexity of the algorithm implementation.

This step then calculates a difference characterization value of the reference neighboring reconstructed image block of the reference image block and the original adjacent reconstructed image block of the block to be processed as a distortion value. The difference characterization value can be calculated in a variety of ways, such as mean absolute error (MAD), absolute error sum (SAD), sum of squared errors (SSD), sum of squared errors (MSD), absolute Hadamard transform error and (SATD) , normalized product correlation measure (NCC), or similarity measure based on sequential similarity detection (SSDA), and so on. When there are a plurality of original adjacent reconstructed image blocks, it may be assumed that the plurality of the original adjacent reconstructed image blocks include a third original adjacent reconstructed image block and a fourth original adjacent reconstructed image block, corresponding to The plurality of the reference adjacent reconstructed image blocks include a third reference neighboring reconstructed image block and a fourth reference neighboring reconstructed image block, and the distortion value is determined by the reference neighboring reconstructed image And a difference characterization value of the block and the original adjacent reconstructed image block, including: a difference between the distortion value by the third reference neighboring reconstructed image block and the third original neighbor reconstructed image block And a representation value and a sum of difference characterization values of the fourth reference neighboring reconstructed image block and the fourth original neighboring reconstructed image block. More generally, the distortion value is obtained according to the following calculation formula:

Wherein, Distortion represents the distortion value, |Delta(Original _i , Reference _i )| represents the difference characterization value of the i-th original adjacent reconstructed image block and the ith reference adjacent reconstructed image block, and p represents The number of the original adjacent reconstructed image blocks used to calculate the distortion value. According to the method used to calculate the difference value, Delta is an expression of various calculation methods such as MAD, SAD, and SSD.

In a possible implementation, the embodiment of the present invention is applied to inter-frame bidirectional prediction, and the reference image block indicated by the candidate prediction motion information includes a first reference image block and a second reference image block, corresponding to The adjacent reconstructed image block of the reference image block indicated by the candidate predicted motion information includes a first reference adjacent reconstructed image block and a second reference adjacent reconstructed image block, correspondingly, the distortion value is determined by the reference Representing a difference characterization value of the adjacent reconstructed image block and the original adjacent reconstructed image block, comprising: the difference between the distortion reference value by the average reference adjacent reconstructed image block and the original adjacent reconstructed image block Representing a value, wherein the average reference neighboring reconstructed image block is obtained by calculating a pixel mean of the first reference neighboring reconstructed image block and the second reference neighboring reconstructed image block; or The distortion value is represented by a mean of a first difference representation value and a second difference representation value, wherein the first difference representation value is from the first reference neighbor reconstructed image block and the original neighbor reconstructed image Block Characterizing said difference values to represent said second difference value characterizing the reconstructed image block adjacent to the second reference and the original difference value characterizing the adjacent reconstructed image block is represented.

In a feasible implementation manner, the image block to be processed has candidate prediction motion information at the sub-block level, and the distortion values corresponding to each sub-block adjacent to the original adjacent reconstructed image block may be respectively obtained and summed. As the distortion value of the image block to be processed.

The encoding method further includes the step of: determining first identification information of each of the N candidate prediction motion information according to a size relationship between the acquired N distortion values, the N candidate prediction motion information and respective One identification information corresponds one by one.

The step firstly compares the sizes between the N distortion values. Specifically, the N candidate prediction motion information may be sequentially arranged according to the distortion values from small to large or from large to small. Then, the first identification information of each of the N candidate predicted motion information is given according to the comparison result. The length of the binary character string of the first identification information of the candidate prediction motion information with the smaller distortion value is less than or equal to, for example, not greater than, the first identification information of the candidate prediction motion information that encodes the distortion value is large. The length of the binary string.

The candidate prediction motion information with large similarity (small distortion value) is more likely to be selected as prediction information, and a binary character string of a shorter codeword is given to represent the identification value, which can save coding bits and improve coding efficiency.

The encoding method further includes the step of: when the target predicted motion information of the image block to be processed is one of the N candidate predicted motion information of the determined first identification information, the first of the target predicted motion information The identification information is programmed into the code stream.

A second aspect of the embodiments of the present application provides a method for decoding image block prediction motion information, including: parsing target identification information of target prediction motion information of a to-be-processed image block from a code stream; and determining N candidate prediction motions. Information, the N candidate prediction motion information includes the target prediction motion information, where N is an integer greater than 1, and acquiring distortion values corresponding to the N candidate prediction motion information, the distortion values being by the candidate Determining the adjacent reconstructed image block of the reference image block indicated by the motion information and the adjacent reconstructed image block of the image block to be processed; determining the N according to a size relationship between the acquired N distortion values The first identification information of the candidate prediction motion information, the N candidate prediction motion information and the respective first identification information are in one-to-one correspondence; the candidate prediction motion information corresponding to the first identification information that matches the target identification information It is determined that the motion information is predicted for the target.

When the to-be-processed block has a plurality of candidate prediction motion vectors, the similarity between the reference image blocks indicated by the candidate prediction motion vector of the to-be-processed block and the to-be-processed block is a priori knowledge to assist in determining each candidate prediction motion vector. The encoding of the identifier, thereby achieving the purpose of saving coding bits and improving coding efficiency. The embodiments of the second aspect of the present application are related to the coding method of the first aspect, and the beneficial technical effects are the same. Reference may be made to the description of the technical effects in the first aspect, and details are not described herein.

In a possible implementation manner of the second aspect, the adjacent reconstructed image block of the reference image block indicated by the candidate predicted motion information includes a reference adjacent reconstructed image block, and the adjacent weight of the to-be-processed image block The image block includes an original adjacent reconstructed image block corresponding to the reference neighboring reconstructed image block, the distortion value is an adjacent reconstructed image block of the reference image block indicated by the candidate predicted motion information, and the The adjacent reconstructed image block of the image block to be processed is determined to include: the distortion value is represented by a difference characterization value of the reference adjacent reconstructed image block and the original adjacent reconstructed image block, the reference phase The adjacent reconstructed image block has the same shape and the same size as the original adjacent reconstructed image block, and the positional relationship between the reference adjacent reconstructed image block and the reference image block and the original adjacent reconstruction The positional relationship between the image block and the image block to be processed is the same.

In a possible implementation manner of the second aspect, the difference characterization value of the reference adjacent reconstructed image block and the original adjacent reconstructed image block includes: the reference adjacent reconstructed image block and the An average absolute error of the original adjacent reconstructed image block; an absolute error sum of the reference neighboring reconstructed image block and the original adjacent reconstructed image block; the reference adjacent reconstructed image block and the original a sum of squared errors of adjacent reconstructed image blocks; a squared sum of average errors of said reference neighboring reconstructed image blocks and said original adjacent reconstructed image blocks; said reference adjacent reconstructed image block and said original phase An absolute Hadamard transform error sum of the neighboring reconstructed image block; a normalized product correlation metric of the reference neighboring reconstructed image block and the original neighboring reconstructed image block; or the reference neighboring weight A similarity measure based on sequential similarity detection of the image block and the original adjacent reconstructed image block.

In a possible implementation manner of the second aspect, the image block to be processed is a rectangle, the width of the image block to be processed is W, the height is H, and the original adjacent reconstructed image block is a rectangle. The lower boundary of the original adjacent reconstructed image block is adjacent to the upper boundary of the image block to be processed, including: the original adjacent reconstructed image block has a width W and a height n; or, the original phase The adjacent reconstructed image block has a width of W+H and a height of n; wherein W, H, and n are positive integers.

In a possible embodiment of the second aspect, characterized in that n is 1 or 2.

In a possible implementation manner of the second aspect, the image block to be processed is a rectangle, the width of the image block to be processed is W, the height is H, and the original adjacent reconstructed image block is a rectangle. The right boundary of the original adjacent reconstructed image block is adjacent to the left boundary of the image block to be processed, and includes: the original adjacent reconstructed image block has a width n and a height H; or the original phase The adjacent reconstructed image block has a width n and a height W+H; wherein W, H, and n are positive integers.

In a possible embodiment of the second aspect, n is 1 or 2.

In a possible implementation manner of the second aspect, the reference image block indicated by the candidate predicted motion information includes a first reference image block and a second reference image block, correspondingly, the reference image indicated by the candidate predicted motion information The adjacent reconstructed image block of the block includes a first reference adjacent reconstructed image block and a second reference adjacent reconstructed image block, correspondingly, the distortion value is determined by the reference adjacent reconstructed image block and the original Representing a difference characterization value of an adjacent reconstructed image block, including: the distortion value is represented by an average reference adjacent reconstructed image block and a difference characterization value of the original adjacent reconstructed image block, wherein the average The reference adjacent reconstructed image block is obtained by calculating a pixel mean of the first reference neighboring reconstructed image block and the second reference neighboring reconstructed image block; or the distortion value is represented by the first difference and Representing a mean value of the second difference characterization value, wherein the first difference characterization value is represented by the difference characterization value of the first reference neighboring reconstructed image block and the original neighboring reconstructed image block, The second Characterization of different values of the original block and the reconstructed image by the second reference difference value characterizing the adjacent reconstructed image block adjacent to represent.

In a possible implementation manner of the second aspect, the adjacent reconstructed image block of the reference image block indicated by the candidate predicted motion information includes a plurality of the reference adjacent reconstructed image blocks, the plurality of The reference adjacent reconstructed image block includes a third reference adjacent reconstructed image block and a fourth reference adjacent reconstructed image block, correspondingly, the adjacent reconstructed image block of the to-be-processed image block includes a plurality of the original An adjacent reconstructed image block, the plurality of the original adjacent reconstructed image blocks including a third original adjacent reconstructed image block and a fourth original adjacent reconstructed image block, the distortion value being determined by the reference phase And representing, by the difference representation value of the adjacent reconstructed image block and the original adjacent reconstructed image block, comprising: the distortion value is reconstructed by the third reference neighboring reconstructed image block and the third original neighboring block And a difference characterization value of the image block and a sum of difference characterization values of the fourth reference adjacent reconstructed image block and the fourth original adjacent reconstructed image block.

In a possible implementation manner of the second aspect, the distortion value is obtained according to the following calculation formula:

Wherein, Distortion represents the distortion value, |Delta(Original _i , Reference _i )| represents the difference characterization value of the i-th original adjacent reconstructed image block and the ith reference adjacent reconstructed image block, and p represents The number of the original adjacent reconstructed image blocks used to calculate the distortion value.

In a possible implementation manner of the second aspect, the determining, according to the magnitude relationship between the acquired N distortion values, the first identification information of each of the N candidate prediction motion information, including: a size between the N pieces of distortion values; a first identification information of each of the N candidate prediction motion information is given according to the comparison result, wherein the first identification information of the candidate prediction motion information with the smaller distortion value is The length of the binary string is less than or equal to the length of the binary string of the first identification information of the candidate prediction motion information having the larger distortion value.

In a possible implementation manner of the second aspect, the comparing a size between the N distortion values includes: sequentially arranging the N according to the distortion value from small to large or from large to small Candidate prediction motion information.

In a possible implementation manner of the second aspect, the determining the N candidate prediction motion information includes: acquiring, in a preset order, N different preset positions with the to-be-processed image block. The motion information of the image block is used as the N candidate prediction motion information.

In a possible implementation manner of the second aspect, the determining the N candidate prediction motion information includes: acquiring, in a preset sequence, the M different preset positions with the to-be-processed image block. Motion information of the image block as M candidate prediction motion information, wherein the M candidate prediction motion information includes the N candidate prediction motion information, M is an integer greater than N; and the M candidate prediction motion information is determined a grouping manner; determining, according to the target identification information and the grouping manner, the N candidate prediction motion information from the M candidate prediction motion information.

In a possible implementation manner of the second aspect, the determining a manner of grouping the M candidate prediction motion information includes: determining a preset manner of the grouping; or analyzing and obtaining the method from the code stream. The grouping method.

In a possible implementation manner of the second aspect, the determining the N candidate prediction motion information includes: parsing coding information of the multiple candidate prediction motion information in the code stream, to obtain the N And predicting the motion information; or parsing the second identifier information in the code stream to obtain the N candidate image blocks indicated by the second identifier information, and using the motion information of the N candidate image blocks as the N candidate prediction motion information; or parsing third identification information in the code stream to obtain the N candidate prediction motion information having a preset correspondence relationship with the third identification information.

In a possible implementation manner of the second aspect, before the obtaining the distortion value corresponding to each of the N candidate prediction motion information, the method further includes: determining adjacent reconstruction of the to-be-processed image block Image blocks are available.

In a possible implementation manner of the second aspect, when the adjacent reconstructed image block of the to-be-processed image block includes at least two of the original adjacent reconstructed image blocks, the determining the image to be processed The adjacent reconstructed image blocks of the block are usable, comprising: determining that at least one of the at least two of the original adjacent reconstructed image blocks is available.

In a possible implementation manner of the second aspect, after the determining the N candidate prediction motion information, the method further includes: determining to perform the acquiring the distortion value corresponding to each of the N candidate prediction motion information.

In a possible implementation manner of the second aspect, the determining, by performing the acquiring the distortion value corresponding to each of the N candidate prediction motion information, includes determining, according to the grouping manner, performing the acquiring the N Each of the candidate prediction motion information corresponding to the distortion value; or, the fourth identification information in the code stream is parsed to determine to perform the acquiring the distortion value corresponding to each of the N candidate prediction motion information.

A third aspect of the embodiments of the present disclosure provides an apparatus for encoding image block prediction motion information, including: an acquiring module, configured to acquire N candidate prediction motion information of an image block to be processed, where N is an integer greater than one; a calculation module, configured to acquire a distortion value corresponding to each of the N candidate prediction motion information, where the distortion value is an adjacent reconstructed image block of the reference image block indicated by the candidate prediction motion information, and the to-be-processed image block a neighboring reconstructed image block determining; a comparing module, configured to determine first identification information of each of the N candidate predicted motion information according to a size relationship between the acquired N distortion values, the N candidates The prediction motion information and the respective first identification information are in one-to-one correspondence; the encoding module is configured to: when the target predicted motion information of the to-be-processed image block is one of the N candidate predicted motion information of the determined first identifier information At the time, the first identification information of the target predicted motion information is encoded into the code stream.

In a possible implementation manner of the third aspect, the encoding apparatus further includes a detecting module, configured to determine that the adjacent reconstructed image block of the to-be-processed image block exists.

In a possible implementation manner of the third aspect, the encoding apparatus further includes a determining module, configured to determine to perform the acquiring the distortion value corresponding to each of the N candidate predicted motion information.

A fourth aspect of the embodiments of the present application provides a decoding apparatus for predicting motion information of an image block, including: a parsing module, configured to parse target identification information of target predicted motion information of an image block to be processed from a code stream; And for determining the N candidate prediction motion information, where the N candidate prediction motion information includes the target prediction motion information, where N is an integer greater than 1; and a calculation module, configured to acquire the N candidate prediction motion information Corresponding distortion values, which are determined by adjacent reconstructed image blocks of the reference image block indicated by the candidate predicted motion information and adjacent reconstructed image blocks of the image block to be processed; a comparison module, Determining, according to the magnitude relationship between the acquired N distortion values, first identification information of each of the N candidate prediction motion information, where the N candidate prediction motion information and the respective first identification information are in one-to-one correspondence; a selection module, configured to determine candidate predicted motion information corresponding to the first identifier information that matches the target identifier information as the target predicted motion information

In a possible implementation manner of the fourth aspect, the decoding apparatus further includes a detecting module, configured to determine that the adjacent reconstructed image block of the to-be-processed image block exists.

In a possible implementation manner of the fourth aspect, the decoding apparatus further includes a determining module, configured to determine to perform the acquiring the distortion value corresponding to each of the N candidate predicted motion information.

A fifth aspect of the embodiments of the present application provides a processing device for predicting motion information of an image block, comprising: a memory for storing a program, the program includes a code, a transceiver for communicating with other devices, and a processor, Used to execute program code in memory. Optionally, when the code is executed, the processor may implement the method of the first aspect or the operations of the method of the second aspect; the transceiver is configured to execute a specific signal driven by the processor Send and receive.

A sixth aspect of the embodiments of the present application provides a computer storage medium for storing computer software instructions for use in the above encoding device, decoding device, or processing device, including a method for performing the above first or second aspect The program designed.

A seventh aspect of an embodiment of the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the first aspect or the second aspect described above.

It should be understood that the second to seventh aspects of the embodiments of the present application are the same as the objects of the first aspect, the technical implementation is consistent, and the technical effects are similar. Reference may be made to the description of the corresponding technical features in the first aspect, and details are not described herein.

DRAWINGS

1 is a schematic block diagram of a video encoding and decoding system in an embodiment of the present application;

2 is a schematic block diagram of a video encoder in an embodiment of the present application;

3 is a schematic block diagram of a video decoder in an embodiment of the present application;

4 is a schematic block diagram of an inter prediction module in an embodiment of the present application;

FIG. 5 is an exemplary flowchart of a merge prediction mode in an embodiment of the present application;

6 is an exemplary flowchart of an advanced motion vector prediction mode in an embodiment of the present application;

FIG. 7 is an exemplary flowchart of motion compensation performed by a video decoder in an embodiment of the present application; FIG.

FIG. 8 is a schematic diagram of an encoding unit and an adjacent position image block associated therewith according to an embodiment of the present application; FIG.

FIG. 9 is an exemplary flowchart of constructing a candidate prediction motion vector list in the embodiment of the present application;

10 is an exemplary schematic diagram of adding a combined candidate motion vector to a merge mode candidate prediction motion vector list in an embodiment of the present application;

11 is an exemplary schematic diagram of adding a scaled candidate motion vector to a merge mode candidate motion vector list in the embodiment of the present application;

12 is an exemplary schematic diagram of adding a zero motion vector to a merge mode candidate motion vector list in the embodiment of the present application;

FIG. 13 is a schematic flowchart of an encoding method according to an embodiment of the present application;

14 is a schematic diagram of a relationship between an adjacent reconstructed image block and an original adjacent reconstructed image block in the embodiment of the present application;

15 is a schematic diagram of a sub-block level motion information processing manner according to an embodiment of the present application;

FIG. 16 is a schematic flowchart of a decoding method according to an embodiment of the present application;

FIG. 17 is a schematic block diagram of an encoding apparatus according to an embodiment of the present application; FIG.

FIG. 18 is a schematic block diagram of a decoding apparatus according to an embodiment of the present application; FIG.

FIG. 19 is a schematic block diagram of an apparatus according to an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be clearly and completely described in the following with reference to the accompanying drawings in the embodiments.

FIG. 1 is a schematic block diagram of a video encoding and decoding system 10 in an embodiment of the present application. As shown in FIG. 1, system 10 includes source device 12 that produces encoded video data that will be decoded by destination device 14 at a later time. Source device 12 and destination device 14 may comprise any of a variety of devices, including desktop computers, notebook computers, tablet computers, set top boxes, telephone handsets such as so-called "smart" phones, so-called "smart" Touchpads, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices or the like. In some applications, source device 12 and destination device 14 can be used for wireless communication.

Destination device 14 may receive encoded video data to be decoded via transmission channel 16. Transport channel 16 may include any type of media or device capable of moving encoded video data from source device 12 to destination device 14. In one possible implementation, transport channel 16 may include communication media that enables source device 12 to transmit encoded video data directly to destination device 14 in real time. The encoded video data may be modulated according to a communication standard (e.g., a wireless communication protocol) and transmitted to destination device 14. Communication media can include any wireless or wired communication medium, such as a radio frequency spectrum or one or more physical transmission lines. The communication medium can form part of a packet-based network (eg, a global network of local area networks, wide area networks, or the Internet). Communication media can include routers, switches, base stations, or any other equipment that can be used to facilitate communication from source device 12 to destination device 14.

Alternatively, the encoded data can be output from the output interface to the storage device. Similarly, encoded data can be accessed from a storage device by an input interface. The storage device can include any of a variety of distributed or locally accessed data storage media, such as a hard drive, Blu-ray Disc, DVD, CD-ROM, flash memory, volatile or non-volatile memory, or Any other suitable digital storage medium for storing encoded video data. In another possible implementation, the storage device may correspond to a file server or another intermediate storage device that may maintain encoded video produced by source device 12. Destination device 14 may access the stored video data from the storage device via streaming or download. The file server can be any type of server capable of storing encoded video data and transmitting this encoded video data to destination device 14. Possible Implementations A file server includes a web server, a file transfer protocol server, a network attached storage device, or a local disk unit. Destination device 14 can access the encoded video data via any standard data connection that includes an Internet connection. This data connection may include a wireless channel (eg, a Wi-Fi connection), a wired connection (eg, a cable modem, etc.), or a combination of both, suitable for accessing encoded video data stored on a file server. The transmission of encoded video data from a storage device can be streaming, downloading, or a combination of both.

The techniques of this application are not necessarily limited to wireless applications or settings. Techniques may be applied to video decoding to support any of a variety of multimedia applications, such as over-the-air broadcast, cable television transmission, satellite television transmission, streaming video transmission (eg, via the Internet), encoding digital video for use in It is stored on a data storage medium and decodes digital video or other applications stored on the data storage medium. In some possible implementations, system 10 can be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.

In the possible implementation of FIG. 1, source device 12 includes a video source 18, a video encoder 20, and an output interface. In some applications, the output interface can include a modem (Modem) 22 and/or a transmitter 24. In source device 12, video source 18 may include sources such as video capture devices (eg, cameras), video archives containing previously captured video, video feed interfaces to receive video from video content providers And/or a computer graphics system for generating computer graphics data as source video, or a combination of these sources. As a possible implementation, if the video source 18 is a video camera, the source device 12 and the destination device 14 may form a so-called camera phone or video phone. The techniques described in this application are illustratively applicable to video decoding and are applicable to wireless and/or wired applications.

Captured, pre-captured, or computer generated video may be encoded by video encoder 20. The encoded video data can be transmitted directly to the destination device 14 via the output interface of the source device 12. The encoded video data may also (or alternatively) be stored on a storage device for later access by the destination device 14 or other device for decoding and/or playback.

Destination device 14 includes an input interface, video decoder 30, and display device 32. In some applications, the input interface can include a receiver 26 and/or a modem 28. The input interface 28 of the destination device 14 receives the encoded video data via the transmission channel 16. The encoded video data communicated or provided on the storage device via the transmission channel 16 may include various syntax elements generated by the video encoder 20 for use by the video decoder of the video decoder 30 to decode the video data. These syntax elements can be included with encoded video data that is transmitted over a communication medium, stored on a storage medium, or stored on a file server.

Display device 32 may be integrated with destination device 14 or external to destination device 14. In some possible implementations, destination device 14 can include an integrated display device and is also configured to interface with an external display device. In other possible implementations, the destination device 14 can be a display device. In general, display device 32 displays decoded video data to a user and may include any of a variety of display devices, such as a liquid crystal display, a plasma display, an organic light emitting diode display, or another type of display device.

Video encoder 20 and video decoder 30 may operate in accordance with, for example, the next generation video codec compression standard (H.266) currently under development and may conform to the H.266 Test Model (JEM). Alternatively, video encoder 20 and video decoder 30 may be according to, for example, the ITU-TH.265 standard, also referred to as a high efficiency video decoding standard, or other proprietary or industry standard of the ITU-TH.264 standard or an extension of these standards. In operation, the ITU-TH.264 standard is alternatively referred to as MPEG-4 Part 10, also known as advanced video coding (AVC). However, the techniques of this application are not limited to any particular decoding standard. Other possible implementations of the video compression standard include MPEG-2 and ITU-TH.263.

Although not shown in FIG. 1, in some aspects video encoder 20 and video decoder 30 may each be integrated with an audio encoder and decoder and may include a suitable multiplexer-demultiplexer ( MUX-DEMUX) unit or other hardware and software to handle the encoding of both audio and video in a common data stream or in a separate data stream. If applicable, in some possible implementations, the MUX-DEMUX unit may conform to the ITU H.223 multiplexer protocol or other protocols such as the User Datagram Protocol (UDP).

Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder circuits, such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), Field Programmable Gate Array (FPGA), discrete logic, software, hardware, firmware, or any combination thereof. When the technology is implemented partially in software, the device may store the instructions of the software in a suitable non-transitory computer readable medium and execute the instructions in hardware using one or more processors to perform the techniques of the present application. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, any of which may be integrated into a combined encoder/decoder (CODEC) in a respective device. part.

The present application may illustratively involve video encoder 20 "signaling" particular information to another device, such as video decoder 30. However, it should be understood that video encoder 20 may signal information by associating particular syntax elements with various encoded portions of the video data. That is, video encoder 20 may "signal" the data by storing the particular syntax elements to the header information of the various encoded portions of the video data. In some applications, these syntax elements may be encoded and stored (eg, stored to storage system 34 or file server 36) prior to being received and decoded by video decoder 30. Thus, the term "signaling" may illustratively refer to the communication of syntax elements or other data used to decode compressed video data, whether this communication occurs in real time or near real time or occurs over a time span, such as may be The encoding occurs when the syntax elements are stored to the media, and the syntax elements can then be retrieved by the decoding device at any time after storage to the media.

JCT-VC developed the H.265 (HEVC) standard. HEVC standardization is based on an evolution model of a video decoding device called the HEVC Test Model (HM). The latest standard documentation for H.265 is available at http://www.itu.int/rec/T-REC-H.265. The latest version of the standard document is H.265 (12/16), which is the full text of the standard document. The manner of reference is incorporated herein. The HM assumes that the video decoding device has several additional capabilities with respect to existing algorithms of ITU-TH.264/AVC. For example, H.264 provides nine intra-prediction coding modes, while HM provides up to 35 intra-prediction coding modes.

JVET is committed to the development of the H.266 standard. The H.266 standardization process is based on an evolution model of a video decoding device called the H.266 test model. The algorithm description of H.266 is available from http://phenix.int-evry.fr/jvet, and the latest algorithm description is included in JVET-F1001-v2, which is incorporated herein by reference in its entirety. . At the same time, the reference software for the JEM test model is available from https://jvet.hhi.fraunhofer.de/svn/svn_HMJEMSoftware/, which is also incorporated herein by reference in its entirety.

It should be understood that in some embodiments, the motion information includes motion vectors and reference frame information, while in other embodiments, where the reference frame information is determined, the motion vectors are the primary subject of interest in the motion information. For the convenience of description, when the motion vector or the motion vector is only described as the description, the reference frame information is determined and implicitly determined (for example, the method of the embodiment of the present application is used in intra prediction, or only one frame is used. The reference frame or the like may be used or the information determined by the embodiment of the present application (such as using the same processing method as the motion vector) or other embodiments may be used, and does not mean that the reference frame information can be ignored. In general, the model description of HM can divide a video frame or image into a tree block or a sequence of largest coding units (LCUs) containing both luminance and chrominance samples, which is also referred to as a CTU. Treeblocks have similar purposes to macroblocks of the H.264 standard. A stripe contains several consecutive treeblocks in decoding order. A video frame or image can be segmented into one or more stripes. Each tree block can be split into coding units according to a quadtree. For example, a tree block that is the root node of a quadtree can be split into four child nodes, and each child node can be a parent node again and split into four other child nodes. The final non-splitable child nodes that are leaf nodes of the quadtree include decoding nodes, such as decoded video blocks. The syntax data associated with the decoded code stream may define the maximum number of times the tree block can be split, and may also define the minimum size of the decoded node.

The coding unit includes a decoding node and a prediction unit (PU) and a transform unit (TU) associated with the decoding node. The size of the CU corresponds to the size of the decoding node and the shape must be square. The size of the CU may range from 8 x 8 pixels up to a maximum of 64 x 64 pixels or larger. Each CU may contain one or more PUs and one or more TUs. For example, syntax data associated with a CU may describe a situation in which a CU is partitioned into one or more PUs. The split mode may be different between situations where the CU is skipped or encoded by direct mode coding, intra prediction mode coding, or inter prediction mode. The PU can be divided into a shape that is non-square. For example, syntax data associated with a CU may also describe a situation in which a CU is partitioned into one or more TUs according to a quadtree. The shape of the TU can be square or non-square.

The HEVC standard allows for transforms based on TUs, which can be different for different CUs. The TU is typically sized based on the size of the PU within a given CU defined for the partitioned LCU, although this may not always be the case. The size of the TU is usually the same as or smaller than the PU. In some possible implementations, the residual samples corresponding to the CU may be subdivided into smaller units using a quadtree structure called a "residual quaternary tree" (RQT). The leaf node of the RQT can be referred to as a TU. The pixel difference values associated with the TU may be transformed to produce transform coefficients, which may be quantized.

In general, a PU contains data related to the prediction process. For example, when the PU is intra-mode encoded, the PU may include data describing the intra prediction mode of the PU. As another possible implementation manner, when the PU is inter-mode encoded, the PU may include data defining a motion vector of the PU. For example, the data defining the motion vector of the PU may describe the horizontal component of the motion vector, the vertical component of the motion vector, the resolution of the motion vector (eg, quarter-pixel precision or eighth-pixel precision), motion vector A reference image pointed to, and/or a reference image list of motion vectors (eg, list 0, list 1, or list C).

In general, TUs use transform and quantization processes. A given CU with one or more PUs may also contain one or more TUs. After prediction, video encoder 20 may calculate residual values corresponding to the PU. The residual values include pixel difference values, which can be transformed into transform coefficients, quantized, and scanned using TU to produce serialized transform coefficients for entropy decoding. The present application generally refers to the term "video block" to refer to a decoding node of a CU. In some specific applications, the term "video block" may also be used herein to refer to a tree block containing a decoding node as well as a PU and a TU, eg, an LCU or CU.

A video sequence usually contains a series of video frames or images. A group of picture (GOP) illustratively includes a series of one or more video images. The GOP may include syntax data in the header information of the GOP, in the header information of one or more of the images, or elsewhere, the syntax data describing the number of images included in the GOP. Each strip of the image may contain stripe syntax data describing the encoding mode of the corresponding image. Video encoder 20 typically operates on video blocks within individual video stripes to encode video data. A video block may correspond to a decoding node within a CU. Video blocks may have fixed or varying sizes and may vary in size depending on the specified decoding criteria.

As a possible implementation, HM supports prediction of various PU sizes. Assuming that the size of a specific CU is 2N×2N, HM supports intra prediction of PU size of 2N×2N or N×N, and inter-frame prediction of 2N×2N, 2N×N, N×2N or N×N symmetric PU size prediction. The HM also supports asymmetric partitioning of inter-prediction of PU sizes of 2N x nU, 2N x nD, nL x 2N, and nR x 2N. In the asymmetric segmentation, one direction of the CU is not divided, and the other direction is divided into 25% and 75%. The portion of the CU corresponding to the 25% segment is indicated by an indication of "n" followed by "up (U)", "lower (D)", "left (L)", or "right (R)". Thus, for example, "2N x nU" refers to a horizontally partitioned 2N x 2 NCU, where 2N x 0.5 NPU is at the top and 2N x 1.5 NPU is at the bottom.

In the present application, "N x N" and "N by N" are used interchangeably to refer to the pixel size of a video block in accordance with the vertical dimension and the horizontal dimension, for example, 16 x 16 pixels or 16 by 16 pixels. In general, a 16x16 block will have 16 pixels (y = 16) in the vertical direction and 16 pixels (x = 16) in the horizontal direction. Likewise, an NxN block typically has N pixels in the vertical direction and N pixels in the horizontal direction, where N represents a non-negative integer value. The pixels in the block can be arranged in rows and columns. Further, the block does not necessarily need to have the same number of pixels in the horizontal direction as in the vertical direction. For example, a block may include N x M pixels, where M is not necessarily equal to N.

After intra-predictive or inter-predictive decoding of a PU using a CU, video encoder 20 may calculate residual data (also referred to as residual) of the TU of the CU. A PU may include pixel data in a spatial domain (also referred to as a pixel domain), and a TU may be included in transforming (eg, discrete cosine transform (DCT), integer transform, wavelet transform, or conceptually similar transform) The residual video data is applied to form coefficients in the transform domain. The residual data may correspond to a pixel difference between the pixels of the uncoded image and the predicted value of the PU. Video encoder 20 may form a TU that includes residual data for the CU, and then transform the TU to generate transform coefficients for the CU.

After any transform to generate transform coefficients, video encoder 20 may perform quantization of the transform coefficients. Quantization illustratively refers to the process of quantizing the coefficients to possibly reduce the amount of data used to represent the coefficients to provide further compression. The quantization process can reduce the bit depth associated with some or all of the coefficients. For example, the n-bit value can be rounded down to an m-bit value during quantization, where n is greater than m.

The JEM model further improves the coding structure of video images. Specifically, a block coding structure called "Quad Tree Combined Binary Tree" (QTBT) is introduced. The QTBT structure rejects the concepts of CU, PU, TU, etc. in HEVC, and supports more flexible CU partitioning shapes. One CU can be square or rectangular. A CTU first performs quadtree partitioning, and the leaf nodes of the quadtree further perform binary tree partitioning. At the same time, there are two division modes in the binary tree division, symmetric horizontal division and symmetric vertical division. The leaf nodes of the binary tree are called CUs, and the CUs of the JEM cannot be further divided during the prediction and transformation process, that is, the CUs, PUs, and TUs of the JEM have the same block size. In the current JEM, the maximum size of the CTU is 256 × 256 luma pixels.

In some possible implementations, video encoder 20 may utilize a predefined scan order to scan the quantized transform coefficients to produce an entropy encoded serialized vector. In other possible implementations, video encoder 20 may perform an adaptive scan. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 20 may be based on context adaptive variable length decoding (CAVLC), context adaptive binary arithmetic decoding (CABAC), grammar-based context adaptive binary. Arithmetic decoding (SBAC), probability interval partitioning entropy (PIPE) decoding, or other entropy decoding methods to entropy decode a one-dimensional vector. Video encoder 20 may also entropy encode syntax elements associated with the encoded video data for use by video decoder 30 to decode the video data.

To perform CABAC, video encoder 20 may assign contexts within the context model to the symbols to be transmitted. The context can be related to whether the adjacent value of the symbol is non-zero. In order to perform CAVLC, video encoder 20 may select a variable length code of the symbol to be transmitted. Codewords in variable length decoding (VLC) may be constructed such that relatively shorter codes correspond to more likely symbols, while longer codes correspond to less likely symbols. In this way, the use of VLC can achieve the goal of saving code rate with respect to using equal length codewords for each symbol to be transmitted. The probability in CABAC can be determined based on the context assigned to the symbol.

In an embodiment of the present application, a video encoder may perform inter prediction to reduce temporal redundancy between images. As described above, a CU may have one or more prediction units PU as specified by different video compression codec standards. In other words, multiple PUs may belong to the CU, or the PUs and CUs may be the same size. When the CU and the PU are the same size, the partition mode of the CU is not divided, or is divided into one PU, and the PU is used for expression. When the video encoder performs inter prediction, the video encoder can signal the video decoder for motion information for the PU. Exemplarily, the motion information of the PU may include: a reference image index, a motion vector, and a prediction direction identifier. The motion vector may indicate a displacement between an image block (also referred to as a video block, a block of pixels, a set of pixels, etc.) of the PU and a reference image block of the PU. The reference image block of the PU may be part of a reference image of an image block similar to a PU. The reference image block may be located in a reference image indicated by the reference image index and the prediction direction indicator.

In order to reduce the number of coded bits required to represent the motion information of the PU, the video encoder may generate a candidate motion vector (Motion Vector, MV) for each of the PUs according to the merge prediction mode or the advanced motion vector prediction mode process. List. Each candidate predicted motion vector in the candidate predicted motion vector list for the PU may indicate motion information. The motion information indicated by some candidate predicted motion vectors in the candidate predicted motion vector list may be based on motion information of other PUs. If the candidate prediction motion vector indicates motion information specifying one of a spatial candidate prediction motion vector position or a temporal candidate prediction motion vector position, the present application may refer to the candidate prediction motion vector as a “original” candidate prediction motion vector. For example, for a merge mode, also referred to herein as a merge prediction mode, there may be five original spatial candidate predicted motion vector locations and one original temporal candidate predicted motion vector location. In some examples, the video encoder may generate additional candidate predicted motion vectors by combining partial motion vectors from different original candidate prediction motion vectors, modifying original candidate prediction motion vectors, or simply inserting zero motion vectors as candidate prediction motion vectors. These additional candidate prediction motion vectors are not considered to be original candidate prediction motion vectors and may be referred to as artificially generated candidate prediction motion vectors in this application.

The techniques of the present application generally relate to techniques for generating a list of candidate predictive motion vectors at a video encoder and techniques for generating a list of identical candidate motion vectors at a video decoder. The video encoder and video decoder may generate the same candidate prediction motion vector list by implementing the same techniques used to construct the candidate prediction motion vector list. For example, both a video encoder and a video decoder may construct a list with the same number of candidate predicted motion vectors (eg, five candidate predicted motion vectors). The video encoder and decoder may first consider spatial candidate prediction motion vectors (eg, neighboring blocks in the same image), then consider temporal candidate prediction motion vectors (eg, candidate prediction motion vectors in different images), and may ultimately consider The artificially generated candidate predicted motion vector until the desired number of candidate predicted motion vectors are added to the list. According to the techniques of the present application, a pruning operation may be utilized for certain types of candidate prediction motion vectors to remove repetitions from candidate prediction motion vector lists during candidate prediction motion vector list construction, while for other types of candidate prediction motion vectors, may not Use clipping to reduce decoder complexity. For example, for a spatial candidate prediction motion vector set and for a temporal candidate prediction motion vector, a pruning operation may be performed to exclude candidate prediction motion vectors having repeated motion information from a list of candidate prediction motion vectors. However, when the artificially generated candidate predicted motion vector is added to the list of candidate predicted motion vectors, the artificially generated candidate predicted motion vector may be added without performing a trimming operation on the artificially generated candidate predicted motion vector.

After generating a candidate prediction motion vector list for the PU of the CU, the video encoder may select a candidate prediction motion vector from the candidate prediction motion vector list and output a candidate prediction motion vector index in the code stream. The selected candidate predicted motion vector may be a candidate predicted motion vector having a motion vector that produces a predictor that most closely matches the target PU being decoded. The candidate predicted motion vector index may indicate the location of the candidate predicted motion vector selected in the candidate predicted motion vector list. The video encoder may also generate a predictive image block for the PU based on the reference image block indicated by the motion information of the PU. The motion information of the PU may be determined based on the motion information indicated by the selected candidate predicted motion vector. For example, in the merge mode, the motion information of the PU may be the same as the motion information indicated by the selected candidate prediction motion vector. In the AMVP mode, the motion information of the PU may be determined based on the motion vector difference of the PU and the motion information indicated by the selected candidate predicted motion vector. The video encoder may generate one or more residual image blocks for the CU based on the predictive image block of the PU of the CU and the original image block for the CU. The video encoder may then encode one or more residual image blocks and output one or more residual image blocks in the code stream.

The code stream may include data identifying a selected candidate predicted motion vector in the candidate motion vector vector list of the PU. The video decoder may determine motion information for the PU based on motion information indicated by the selected candidate predicted motion vector in the candidate motion vector list of the PU. The video decoder may identify one or more reference image blocks for the PU based on the motion information of the PU. After identifying one or more reference image blocks of the PU, the video decoder may generate a predictive image block for the PU based on one or more reference image blocks of the PU. The video decoder may reconstruct an image block for the CU based on the predictive image block for the PU of the CU and one or more residual image blocks for the CU.

For ease of explanation, the present application may describe a location or image block as having various spatial relationships with a CU or PU. This description may be interpreted to mean that the location or image block and the image block associated with the CU or PU have various spatial relationships. In addition, the present application may refer to a PU that is currently being decoded by a video decoder as a current PU, also referred to as a current image block to be processed. The present application may refer to a CU currently being decoded by a video decoder as a current CU. The present application may refer to the image currently being decoded by the video decoder as the current image. It should be understood that the present application is applicable to the case where the PU and the CU have the same size, or the PU is the CU, and the PU is used uniformly.

As briefly described above, video encoder 20 may use inter prediction to generate predictive image blocks and motion information for the PU of the CU. In many examples, the motion information for a given PU may be the same or similar to the motion information of one or more nearby PUs (ie, PUs whose image blocks are spatially or temporally near the image block of a given PU). Because nearby PUs often have similar motion information, video encoder 20 may encode motion information for a given PU with reference to motion information for nearby PUs. Encoding motion information for a given PU with reference to motion information of nearby PUs may reduce the number of coded bits in the codestream that are required to indicate motion information for a given PU.

Video encoder 20 may encode motion information for a given PU with reference to motion information of nearby PUs in various manners. For example, video encoder 20 may indicate that the motion information for a given PU is the same as the motion information for a nearby PU. The present application may use a merge mode to refer to motion information indicating that a given PU is the same as motion information of a nearby PU or may be derived from motion information of nearby PUs. In another possible implementation, video encoder 20 may calculate a Motion Vector Difference (MVD) for a given PU. The MVD indicates the difference between the motion vector of a given PU and the motion vector of a nearby PU. Video encoder 20 may include the MVD instead of the motion vector of a given PU in the motion information for a given PU. The representation of the MVD in the code stream is less than the coded bits required to represent the motion vector of a given PU. The present application may use advanced motion vector prediction mode to refer to signaling the motion information of a given PU by using the MVD and identifying the index value of the candidate motion vector.

To signal the motion information of a given PU for a decoder using a merge mode or an AMVP mode, video encoder 20 may generate a candidate predicted motion vector list for a given PU. The candidate predicted motion vector list may include one or more candidate predicted motion vectors. Each of the candidate predicted motion vectors in the candidate predicted motion vector list for a given PU may specify motion information. The motion information indicated by each candidate prediction motion vector may include a motion vector, a reference image index, and a prediction direction indicator. The candidate predicted motion vectors in the candidate predicted motion vector list may include "raw" candidate predicted motion vectors, each of which indicates motion information that is different from one of the specified candidate predicted motion vector locations within the PU of the given PU.

After generating the candidate prediction motion vector list for the PU, video encoder 20 may select one of the candidate prediction motion vectors from the candidate prediction motion vector list for the PU. For example, the video encoder may compare each candidate predicted motion vector to the PU being decoded and may select a candidate predicted motion vector having the desired rate-distortion penalty. Video encoder 20 may output a candidate predicted motion vector index for the PU. The candidate predicted motion vector index may identify the location of the selected candidate predicted motion vector in the candidate predicted motion vector list.

Moreover, video encoder 20 may generate a predictive image block for the PU based on the reference image block indicated by the motion information of the PU. The motion information of the PU may be determined based on motion information indicated by the selected candidate predicted motion vector in the candidate predicted motion vector list for the PU. For example, in the merge mode, the motion information of the PU may be the same as the motion information indicated by the selected candidate prediction motion vector. In the AMVP mode, motion information of the PU may be determined based on a motion vector difference for the PU and motion information indicated by the selected candidate predicted motion vector. Video encoder 20 may process the predictive image blocks for the PU as previously described.

When video decoder 30 receives the codestream, video decoder 30 may generate a candidate predicted motion vector list for each of the PUs of the CU. The candidate predicted motion vector list generated by the video decoder 30 for the PU may be the same as the candidate predicted motion vector list generated by the video encoder 20 for the PU. The syntax element parsed from the code stream may indicate the location of the candidate prediction motion vector selected in the candidate prediction motion vector list for the PU. After generating the candidate prediction motion vector list for the PU, video decoder 30 may generate a predictive image block for the PU based on one or more reference image blocks indicated by the motion information of the PU. Video decoder 30 may determine motion information for the PU based on motion information indicated by the selected candidate predicted motion vector in the candidate predicted motion vector list for the PU. Video decoder 30 may reconstruct an image block for the CU based on the predictive image block for the PU and the residual image block for the CU.

It should be understood that, in a feasible implementation manner, at the decoding end, the construction of the candidate prediction motion vector list and the resolution of the selected candidate prediction motion vector in the candidate motion vector list from the code stream are independent of each other, and may be arbitrary. Conducted sequentially or in parallel.

In another feasible implementation manner, at the decoding end, first, the position of the selected candidate prediction motion vector in the candidate prediction motion vector list is parsed from the code stream, and the candidate prediction motion vector list is constructed according to the parsed position, where In the embodiment, it is not necessary to construct all the candidate prediction motion vector lists, and only the candidate prediction motion vector list at the parsed position needs to be constructed, that is, the candidate prediction motion vector at the position can be determined. For example, when the code stream is parsed to obtain the candidate candidate motion vector as the candidate motion vector with index 3 in the candidate motion vector list, only the candidate motion vector with index 0 to index 3 needs to be constructed. The list can determine the candidate prediction motion vector with the index of 3, which can achieve the technical effect of reducing the complexity and improving the decoding efficiency.

It should be noted that the establishment of the motion vector list is not only used in the Merge or AMVP technologies described above, but also in various inter-frame and intra prediction techniques related to Motion Estimation (ME).

In the possible embodiment of FIG. 2, video encoder 20 includes a partitioning unit 35, a prediction unit 41, a reference image memory 64, a summer 50, a transform processing unit 52, a quantization unit 54, and an entropy encoding unit 56. The prediction unit 41 includes a motion estimation unit 42, a motion compensation unit 44, and an intra prediction module 46. For video block reconstruction, video encoder 20 also includes inverse quantization unit 58, inverse transform unit 60, and summer 62. A deblocking filter (not shown in Figure 2) may also be included to filter the block boundaries to remove blockiness artifacts from the reconstructed video. The deblocking filter will typically filter the output of summer 62 as needed. In addition to the deblocking filter, an additional loop filter (in-loop or post-loop) can also be used.

As shown in FIG. 2, video encoder 20 receives video data, and segmentation unit 35 segments the data into video blocks. This partitioning may also include partitioning into strips, image blocks, or other larger units, and, for example, video block partitioning based on the quadtree structure of the LCU and CU. Video encoder 20 exemplarily illustrates the components of a video block encoded within a video strip to be encoded. In general, a stripe may be partitioned into multiple video blocks (and possibly into a collection of video blocks called image blocks).

Prediction unit 41 may select one of a plurality of possible decoding modes of the current video block based on the encoding quality and the cost calculation result (eg, rate-distortion cost, RDcost), such as one or more of a plurality of intra-coding modes One of the inter-frame decoding modes. Prediction unit 41 may provide the resulting intra-coded or inter-coded block to summer 50 to generate residual block data and provide the resulting intra-coded or inter-coded block to summer 62 to reconstruct the The coded block is thus used as a reference image.

Motion estimation unit 42 and motion compensation unit 44 within prediction unit 41 perform inter-predictive decoding of the current video block relative to one or more predictive blocks in one or more reference pictures to provide temporal compression. Motion estimation unit 42 may be configured to determine an inter prediction mode for the video stripe based on a predetermined pattern of the video sequence. The predetermined mode specifies the video strips in the sequence as P strips, B strips, or GPB strips. Motion estimation unit 42 and motion compensation unit 44 may be highly integrated, but are separately illustrated for conceptual purposes. The motion performed by motion estimation unit 42 is estimated to be a process of generating a motion vector of the estimated video block. For example, the motion vector may indicate the displacement of the PU of the video block within the current video frame or image relative to the predictive block within the reference image.

The predictive block is a block of PUs that are found to closely match the video block to be decoded according to the pixel difference, and the pixel difference may be determined by an absolute difference sum, a square difference sum, or other difference metric. In some possible implementations, video encoder 20 may calculate a value of a sub-integer pixel location of a reference image stored in reference image memory 64. For example, video encoder 20 may interpolate values of a quarter pixel position, an eighth pixel position, or other fractional pixel position of a reference image. Accordingly, motion estimation unit 42 may perform a motion search with respect to the full pixel position and the fractional pixel position and output a motion vector having fractional pixel precision.

Motion estimation unit 42 calculates the motion vector of the PU of the video block in the inter-coded slice by comparing the location of the PU with the location of the predictive block of the reference picture. The reference images may be selected from a first reference image list (List 0) or a second reference image list (List 1), each of the lists identifying one or more reference images stored in the reference image memory 64. Motion estimation unit 42 transmits the computed motion vector to entropy encoding unit 56 and motion compensation unit 44.

Motion compensation performed by motion compensation unit 44 may involve extracting or generating predictive blocks based on motion vectors determined by motion estimation, possibly performing interpolation to sub-pixel precision. After receiving the motion vector of the PU of the current video block, motion compensation unit 44 may locate the predictive block pointed to by the motion vector in one of the reference image lists. Video encoder 20 forms a residual video block by subtracting the pixel values of the predictive block from the pixel values of the current video block being decoded, thereby forming pixel difference values. The pixel difference values form residual data for the block and may include both luminance and chrominance difference components. Summer 50 represents one or more components that perform this subtraction. Motion compensation unit 44 may also generate syntax elements associated with video blocks and video slices for video decoder 30 to use to decode video blocks of video slices.

If the PU is located in a B-strip, the PU-containing image may be associated with two reference image lists called "List 0" and "List 1". In some possible implementations, an image containing B strips may be associated with a list combination that is a combination of List 0 and List 1.

Furthermore, if the PU is located in a B-strip, motion estimation unit 42 may perform uni-directional prediction or bi-directional prediction for the PU, wherein, in some possible implementations, bi-directional prediction is based on List 0 and List 1 reference image lists, respectively. The prediction performed by the image, in other possible embodiments, the bidirectional prediction is prediction based on the reconstructed future frame and the reconstructed past frame in the display order of the current frame, respectively. When the motion estimation unit 42 performs unidirectional prediction for the PU, the motion estimation unit 42 may search for a reference image block for the PU in the reference image of list 0 or list 1. Motion estimation unit 42 may then generate a reference index indicating a reference image containing the reference image block in list 0 or list 1 and a motion vector indicating a spatial displacement between the PU and the reference image block. The motion estimation unit 42 may output a reference index, a prediction direction identifier, and a motion vector as motion information of the PU. The prediction direction indicator may indicate that the reference index indicates the reference image in list 0 or list 1. Motion compensation unit 44 may generate a predictive image block of the PU based on the reference image block indicated by the motion information of the PU.

When the motion estimation unit 42 performs bidirectional prediction for the PU, the motion estimation unit 42 may search for a reference image block for the PU in the reference image in the list 0 and may also search for the PU for the PU in the reference image in the list 1 A reference image block. Motion estimation unit 42 may then generate a reference index indicating the reference image containing the reference image block in list 0 and list 1 and a motion vector indicating the spatial displacement between the reference image block and the PU. The motion estimation unit 42 may output a reference index of the PU and a motion vector as motion information of the PU. Motion compensation unit 44 may generate a predictive image block of the PU based on the reference image block indicated by the motion information of the PU.

In some possible implementations, motion estimation unit 42 does not output a complete set of motion information for the PU to entropy encoding module 56. Rather, motion estimation unit 42 may signal the motion information of the PU with reference to motion information of another PU. For example, motion estimation unit 42 may determine that the motion information of the PU is sufficiently similar to the motion information of the neighboring PU. In this embodiment, motion estimation unit 42 may indicate an indication value in a syntax structure associated with the PU that indicates to video decoder 30 that the PU has the same motion information as the neighboring PU or has a slave phase The motion information derived by the neighboring PU. In another embodiment, motion estimation unit 42 may identify candidate predicted motion vectors and motion vector differences associated with neighboring PUs in a syntax structure associated with the PU. The MVD indicates the difference between the motion vector of the PU and the indicated candidate predicted motion vector associated with the neighboring PU. Video decoder 30 may use the indicated candidate predicted motion vector and MVD to determine the motion vector of the PU.

As described above, prediction module 41 may generate a list of candidate predicted motion vectors for each PU of the CU. One or more of the candidate predicted motion vector lists may include one or more original candidate predicted motion vectors and one or more additional candidate predicted motion vectors derived from the original candidate predicted motion vectors.

Intra prediction unit 46 within prediction unit 41 may perform intra-predictive decoding of the current video block relative to one or more neighboring blocks in the same image or slice as the current block to be decoded to provide spatial compression . Thus, instead of inter-prediction (as described above) performed by motion estimation unit 42 and motion compensation unit 44, intra-prediction unit 46 may intra-predict the current block. In particular, intra prediction unit 46 may determine an intra prediction mode to encode the current block. In some possible implementations, intra-prediction unit 46 may encode the current block using various intra-prediction modes, for example, during separate encoding traversal, and intra-prediction unit 46 (or in some possible implementations, The mode selection unit 40) may select the appropriate intra prediction mode to use from the tested mode.

After the prediction unit 41 generates a predictive block of the current video block via inter prediction or intra prediction, the video encoder 20 forms a residual video block by subtracting the predictive block from the current video block. The residual video data in the residual block may be included in one or more TUs and applied to transform processing unit 52. Transform processing unit 52 transforms the residual video data into residual transform coefficients using, for example, a discrete cosine transform (DCT) or a conceptually similar transformed transform (eg, a discrete sinusoidal transform DST). Transform processing unit 52 may convert the residual video data from the pixel domain to a transform domain (eg, a frequency domain).

Transform processing unit 52 may send the resulting transform coefficients to quantization unit 54. Quantization unit 54 quantizes the transform coefficients to further reduce the code rate. The quantization process can reduce the bit depth associated with some or all of the coefficients. The degree of quantization can be modified by adjusting the quantization parameters. In some possible implementations, quantization unit 54 may then perform a scan of the matrix containing the quantized transform coefficients. Alternatively, entropy encoding unit 56 may perform a scan.

After quantization, entropy encoding unit 56 may entropy encode the quantized transform coefficients. For example, entropy encoding unit 56 may perform context adaptive variable length decoding, context adaptive binary arithmetic decoding, syntax based context adaptive binary arithmetic decoding, probability interval partition entropy decoding, or another entropy encoding method or technique. Entropy encoding unit 56 may also entropy encode the motion vectors and other syntax elements of the current video strip being decoded. After entropy encoding by entropy encoding unit 56, the encoded code stream may be transmitted to video decoder 30 or archive for later transmission or retrieved by video decoder 30.

Entropy encoding unit 56 may encode information indicative of a selected intra prediction mode in accordance with the techniques of the present application. Video encoder 20 may include encoding of various blocks in transmitted code stream configuration data that may include multiple intra prediction mode index tables and a plurality of modified intra prediction mode index tables (also referred to as codeword mapping tables) A definition of the context and an indication of the MPM, the intra prediction mode index table, and the modified intra prediction mode index table for each of the contexts.

Inverse quantization unit 58 and inverse transform unit 60 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain for later use as a reference image block of the reference image. Motion compensation unit 44 may calculate the reference image block by adding the residual block to the predictive block of one of the reference images within one of the reference image lists. Motion compensation unit 44 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for motion estimation. Summer 62 adds the reconstructed residual block to the motion compensated prediction block generated by motion compensation unit 44 to produce a reference image block for storage in reference image memory 64. The reference image block may be used by the motion estimation unit 42 and the motion compensation unit 44 as a reference image block to inter-predict subsequent video frames or blocks in the image.

FIG. 3 is a schematic block diagram of a video decoder 30 in the embodiment of the present application. In the possible implementation of FIG. 3, video decoder 30 includes an entropy encoding unit 80, a prediction unit 81, an inverse quantization unit 86, an inverse transform unit 88, a summer 90, and a reference image memory 92. The prediction unit 81 includes a motion compensation unit 82 and an intra prediction unit 84. In some possible implementations, video decoder 30 may perform an exemplary reciprocal decoding process with respect to the encoding flow described by video encoder 20 from FIG.

During the decoding process, video decoder 30 receives from video encoder 20 an encoded video bitstream representing the video blocks of the encoded video slice and associated syntax elements. Entropy encoding unit 80 of video decoder 30 entropy decodes the code stream to produce quantized coefficients, motion vectors, and other syntax elements. The entropy encoding unit 80 forwards the motion vectors and other syntax elements to the prediction unit 81. Video decoder 30 may receive syntax elements at the video stripe level and/or video block level.

When the video stripe is decoded into an intra-decoded stripe, intra-prediction unit 84 of prediction unit 81 may generate the current based on the signaled intra-prediction mode and data from the previously decoded block of the current frame or image. The predicted data of the video block of the video stripe.

When the video image is decoded into an inter-decoded slice, motion compensation unit 82 of prediction unit 81 generates a predictive block of the video block of the current video image based on the motion vector and other syntax elements received from entropy encoding unit 80. The predictive block may be generated from one of the reference images within one of the reference image lists. Video decoder 30 may construct a reference image list (List 0 and List 1) using default construction techniques based on reference images stored in reference image memory 92.

Motion compensation unit 82 determines the prediction information for the video block of the current video slice by parsing the motion vector and other syntax elements, and uses the prediction information to generate a predictive block of the current video block that is being decoded. For example, motion compensation unit 82 uses some of the received syntax elements to determine a prediction mode (eg, intra prediction or inter prediction) of the video block used to decode the video slice, inter prediction strip type, stripe Construction information of one or more of the reference picture lists, motion vectors of each inter-coded video block of the stripe, inter-prediction status of each inter-coded video block of the stripe, and used to decode the current video Additional information for the video block in the stripe.

Motion compensation unit 82 may also perform interpolation based on the interpolation filter. Motion compensation unit 82 may calculate the interpolated values of the sub-integer pixels of the reference image block using an interpolation filter as used by video encoder 20 during encoding of the video block. In this application, motion compensation unit 82 may determine the interpolation filters used by video encoder 20 from the received syntax elements and use an interpolation filter to generate the predictive blocks.

If the PU is encoded using inter prediction, motion compensation unit 82 may generate a candidate predicted motion vector list for the PU. Data identifying the location of the selected candidate predicted motion vector in the candidate motion vector list of the PU may be included in the code stream. After generating the candidate prediction motion vector list for the PU, motion compensation unit 82 may generate a predictive image block for the PU based on one or more reference image blocks indicated by the motion information of the PU. The reference image block of the PU may be in a different time image than the PU. Motion compensation unit 82 may determine motion information for the PU based on the selected motion information from the candidate motion vector list of the PU.

Inverse quantization unit 86 inverse quantizes (eg, dequantizes) the quantized transform coefficients provided in the codestream and decoded by entropy encoding unit 80. The inverse quantization process may include determining the degree of quantization using the quantization parameters calculated by video encoder 20 for each of the video slices, and likewise determining the degree of inverse quantization that should be applied. Inverse transform unit 88 applies an inverse transform (eg, an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process) to the transform coefficients to produce a residual block in the pixel domain.

After motion compensation unit 82 generates a predictive block for the current video block based on the motion vector and other syntax elements, video decoder 30 sums the residual block from inverse transform unit 88 with the corresponding predictive block generated by motion compensation unit 82. To form a decoded video block. Summer 90 represents one or more components that perform this summation operation. A deblocking filter can also be applied to filter the decoded blocks to remove blockiness artifacts as needed. Other loop filters (either in the decoding loop or after the decoding loop) can also be used to smooth pixel transitions or otherwise improve video quality. The decoded video block in a given frame or image is then stored in a reference image memory 92, which stores a reference image for subsequent motion compensation. The reference image memory 92 also stores decoded video for later presentation on a display device such as display device 32 of FIG.

As noted above, the techniques of the present application illustratively relate to inter-frame decoding. It should be understood that the techniques of the present application can be performed by any of the video decoders described in this application, the video decoder comprising video encoder 20 and video decoder 30 as shown and described with respect to Figures 1-3. That is, in one possible implementation, the prediction unit 41 described with respect to FIG. 2 may perform the specific techniques described below when performing inter prediction during encoding of blocks of video data. In another possible implementation, the prediction unit 81 described with respect to FIG. 3 may perform the specific techniques described below when performing inter prediction during decoding of blocks of video data. Thus, references to a generic "video encoder" or "video decoder" may include video encoder 20, video decoder 30, or another video encoding or encoding unit.

FIG. 4 is a schematic block diagram of an inter prediction module in an embodiment of the present application. Inter prediction module 121, by way of example, may include motion estimation unit 42 and motion compensation unit 44. In different video compression codec standards, the relationship between PU and CU is different. The inter prediction module 121 may divide the current CU into PUs according to multiple partition modes. For example, the inter prediction module 121 may divide the current CU into PUs according to 2N×2N, 2N×N, N×2N, and N×N partition modes. In other embodiments, the current CU is the current PU, which is not limited.

The inter prediction module 121 may perform an Integer Motion Estimation (IME) on each of the PUs and then perform a Fraction Motion Estimation (FME). When the inter prediction module 121 performs an IME on a PU, the inter prediction module 121 may search for a reference image block for the PU in one or more reference images. After finding the reference image block for the PU, the inter prediction module 121 can generate a motion vector that indicates the spatial displacement between the PU and the reference image block for the PU with integer precision. When the inter prediction module 121 performs FME on the PU, the inter prediction module 121 may improve the motion vector generated by performing the IME on the PU. A motion vector generated by performing FME on a PU may have sub-integer precision (eg, 1/2 pixel precision, 1/4 pixel precision, etc.). After generating the motion vector for the PU, the inter prediction module 121 can use the motion vector for the PU to generate a predictive image block for the PU.

In some possible implementations in which the inter prediction module 121 signals the motion information of the decoder PU using the AMVP mode, the inter prediction module 121 may generate a candidate prediction motion vector list for the PU. The candidate predicted motion vector list may include one or more original candidate predicted motion vectors and one or more additional candidate predicted motion vectors derived from the original candidate predicted motion vectors. After generating the candidate prediction motion vector list for the PU, the inter prediction module 121 may select the candidate prediction motion vector from the candidate prediction motion vector list and generate a motion vector difference for the PU. The MVD for the PU may indicate the difference between the motion vector indicated by the selected candidate prediction motion vector and the motion vector generated for the PU using the IME and FME. In these possible implementations, inter prediction module 121 may output a candidate predicted motion vector index that identifies the location of the selected candidate predicted motion vector in the candidate predicted motion vector list. The inter prediction module 121 can also output the MVD of the PU. A possible implementation of the Advanced Motion Vector Prediction (AMVP) mode in the embodiment of the present application is described in detail below.

In addition to generating motion information for the PU by performing IME and FME on the PU, the inter prediction module 121 may also perform a Merge operation on each of the PUs. When the inter prediction module 121 performs a merge operation on the PU, the inter prediction module 121 may generate a candidate predicted motion vector list for the PU. The candidate prediction motion vector list for the PU may include one or more original candidate prediction motion vectors and one or more additional candidate prediction motion vectors derived from the original candidate prediction motion vectors. The original candidate prediction motion vector in the candidate prediction motion vector list may include one or more spatial candidate prediction motion vectors and temporal candidate prediction motion vectors. The spatial candidate prediction motion vector may indicate motion information of other PUs in the current image. The temporal candidate prediction motion vector may be based on motion information of a corresponding PU that is different from the current image. The temporal candidate prediction motion vector may also be referred to as temporal motion vector prediction (TMVP).

After generating the candidate predicted motion vector list, the inter prediction module 121 may select one of the candidate predicted motion vectors from the candidate predicted motion vector list. Inter prediction module 121 may then generate a predictive image block for the PU based on the reference image block indicated by the motion information of the PU. In the merge mode, the motion information of the PU may be the same as the motion information indicated by the selected candidate prediction motion vector. Figure 5, described below, illustrates an exemplary flow chart of Merge.

After generating a predictive image block for the PU based on the IME and the FME and generating a predictive image block for the PU based on the merging operation, the inter prediction module 121 may select a predictive image block generated by the FME operation or generate a merge operation Predictive image block. In some possible implementations, the inter prediction module 121 may select a predictive image for the PU based on a predictive image block generated by the FME operation and a rate-distortion cost analysis of the predictive image block generated by the combining operation. Piece.

After the inter prediction module 121 has selected the predictive image block of the PU generated by dividing the current CU according to each of the split modes (in some embodiments, after the coding tree unit CTU is divided into CUs, no further further Divided into smaller PUs, where the PU is equivalent to CU), the inter prediction module 121 can select the partition mode for the current CU. In some implementations, inter prediction module 121 can select for a current CU based on a rate-distortion cost analysis of a selected predictive image block of the PU generated by segmenting the current CU according to each of the partition modes. Split mode. The inter prediction module 121 may output the predictive image blocks associated with the PUs belonging to the selected partition mode to the residual generation module 102. The inter prediction module 121 may output syntax elements indicating motion information of the PUs belonging to the selected partition mode to the entropy encoding module 116.

In the diagram of FIG. 4, the inter prediction module 121 includes IME modules 180A through 180N (collectively referred to as "IME module 180"), FME modules 182A through 182N (collectively referred to as "FME module 182"), and merge modules 184A through 184N (collectively The "Merge Module 184"), the PU Mode Decision Modules 186A through 186N (collectively referred to as "PU Mode Decision Module 186") and the CU Mode Decision Module 188 (which may also include a mode decision process that performs a CTU to CU).

IME module 180, FME module 182, and merge module 184 can perform IME operations, FME operations, and merge operations on PUs of the current CU. The inter prediction module 121 is illustrated in the diagram of FIG. 4 as a separate IME module 180, FME module 182, and merge module 184 for each PU for each partition mode of the CU. In other possible implementations, inter prediction module 121 does not include separate IME module 180, FME module 182, and merge module 184 for each PU of each partition mode of the CU.

As illustrated in the schematic diagram of FIG. 4, the IME module 180A, the FME module 182A, and the merging module 184A may perform an IME operation, an FME operation, and a merge operation on a PU generated by dividing a CU according to a 2N×2N partition mode. PU mode decision module 186A may select one of the predictive image blocks generated by IME module 180A, FME module 182A, and merge module 184A.

The IME module 180B, the FME module 182B, and the merging module 184B may perform an IME operation, an FME operation, and a merge operation on a left PU generated by dividing a CU according to an N×2N partition mode. PU mode decision module 186B may select one of the predictive image blocks generated by IME module 180B, FME module 182B, and merge module 184B.

The IME module 180C, the FME module 182C, and the merging module 184C may perform an IME operation, an FME operation, and a merge operation on a right PU generated by dividing a CU according to an N×2N partition mode. The PU mode decision module 186C may select one of the predictive image blocks generated by the IME module 180C, the FME module 182C, and the merge module 184C.

The IME module 180N, the FME module 182N, and the merging module 184 may perform an IME operation, an FME operation, and a merge operation on the lower right PU generated by dividing the CU according to the N×N partition mode. The PU mode decision module 186N may select one of the predictive image blocks generated by the IME module 180N, the FME module 182N, and the merge module 184N.

The PU mode decision module 186 can select a predictive image block based on a rate-distortion cost analysis of the plurality of possible predictive image blocks and select a predictive image block that provides an optimal rate-distortion penalty for a given decoding situation. Illustratively, for bandwidth limited applications, PU mode decision module 186 may prefer to select predictive image blocks that increase compression ratio, while for other applications, PU mode decision module 186 may prefer to select predictive images that increase reconstructed video quality. Piece. After the PU mode decision module 186 selects the predictive image block for the PU of the current CU, the CU mode decision module 188 selects the segmentation mode for the current CU and outputs the predictive image block and motion information for the PUs belonging to the selected segmentation mode. .

FIG. 5 is an exemplary flowchart of a merge mode in the embodiment of the present application. A video encoder (e.g., video encoder 20) may perform a merge operation 200. In other possible implementations, the video encoder may perform a merge operation other than the merge operation 200. For example, in other possible implementations, the video encoder may perform a merge operation in which the video encoder performs more than 200 steps, or steps different from the merge operation 200, than the merge operation. In other possible implementations, the video encoder may perform the steps of the merge operation 200 in a different order or in parallel. The encoder may also perform a merge operation 200 on the PU encoded in a skip mode.

After the video encoder begins the merge operation 200, the video encoder may generate a candidate predicted motion vector list for the current PU (202). The video encoder can generate a list of candidate predicted motion vectors for the current PU in various ways. For example, the video encoder may generate a candidate predicted motion vector list for the current PU according to one of the example techniques described below with respect to Figures 8-12.

As described above, the candidate predicted motion vector list for the current PU may include a temporal candidate predicted motion vector. The temporal candidate prediction motion vector may indicate motion information of a time-domain co-located PU. The co-located PU may be spatially co-located with the current PU at the same location in the image frame, but in the reference image rather than the current image. The present application may refer to a reference image including a PU corresponding to a time domain as a related reference image. The present application may refer to a reference image index of an associated reference image as a related reference image index. As described above, the current image may be associated with one or more reference image lists (eg, list 0, list 1, etc.). The reference image index may indicate the reference image by indicating a position in a reference image list of the reference image. In some possible implementations, the current image can be associated with a combined reference image list.

In some video encoders, the associated reference image index is a reference image index of the PU that encompasses the reference index source location associated with the current PU. In these video encoders, the reference index source location associated with the current PU is adjacent to or adjacent to the current PU. In the present application, a PU may "cover" the particular location if the image block associated with the PU includes a particular location. In these video encoders, if the reference index source location is not available, the video encoder can use a zero reference image index.

However, there may be an example where the reference index source location associated with the current PU is within the current CU. In these examples, if the PU is above or to the left of the current CU, the PU that covers the reference index source location associated with the current PU may be considered available. However, the video encoder may need to access motion information of another PU of the current CU in order to determine a reference image containing the co-located PU. Accordingly, these video encoders may use motion information (ie, reference picture index) of PUs belonging to the current CU to generate temporal candidate prediction motion vectors for the current PU. In other words, these video encoders can generate temporal candidate prediction motion vectors using motion information of PUs belonging to the current CU. Accordingly, the video encoder may not be able to generate a candidate predicted motion vector list for the current PU and the PU that covers the reference index source location associated with the current PU in parallel.

In accordance with the techniques of the present application, a video encoder can explicitly set an associated reference image index without reference to a reference image index of any other PU. This may enable the video encoder to generate candidate prediction motion vector lists for other PUs of the current PU and the current CU in parallel. Because the video encoder explicitly sets the relevant reference picture index, the associated reference picture index is not based on motion information of any other PU of the current CU. In some possible implementations in which the video encoder explicitly sets the relevant reference image index, the video encoder may always set the relevant reference image index to a fixed predefined preset reference image index (eg, 0). In this way, the video encoder may generate the temporal candidate prediction motion vector based on the motion information of the co-located PU in the reference frame indicated by the preset reference image index, and may include the temporal candidate prediction motion vector in the candidate prediction of the current CU Motion vector list.

In a possible implementation where the video encoder explicitly sets the relevant reference picture index, the video encoder can be explicitly used in a syntax structure (eg, an image header, a stripe header, an APS, or another syntax structure) Signals the relevant reference image index. In this possible implementation, the video encoder may signal the decoder for an associated reference picture index for each LCU (ie, CTU), CU, PU, TU, or other type of sub-block. For example, the video encoder may signal that the associated reference picture index for each PU of the CU is equal to "1."

In some possible implementations, the associated reference image index can be set implicitly rather than explicitly. In these feasible embodiments, the video encoder may generate the candidate predicted motion vector list for the PU of the current CU using the motion information of the PU in the reference image indicated by the reference image index of the PU covering the location outside the current CU. Each time candidate predicts a motion vector, even if these locations are not strictly adjacent to the current PU.

After generating the candidate predicted motion vector list for the current PU, the video encoder may generate a predictive image block (204) associated with the candidate predicted motion vector in the candidate predicted motion vector list. The video encoder may generate motion prediction information for the current PU by determining motion information of the current PU based on the motion information of the indicated candidate motion vector and then generate a predictive image block based on the one or more reference image blocks indicated by the motion information of the current PU. Vector associated predictive image blocks. The video encoder may then select one of the candidate predicted motion vectors from the candidate predicted motion vector list (206). The video encoder can select candidate prediction motion vectors in a variety of ways. For example, the video encoder may select one of the candidate predicted motion vectors based on a rate-distortion cost analysis of each of the predictive image blocks associated with the candidate predicted motion vectors.

After selecting the candidate predicted motion vector, the video encoder may output a candidate predicted motion vector index (208). The candidate predicted motion vector index may indicate the location of the candidate predicted motion vector selected in the candidate predicted motion vector list. In some possible implementations, the candidate predicted motion vector index may be denoted as "merge_idx."

FIG. 6 is an exemplary flowchart of an advanced motion vector prediction mode in an embodiment of the present application. A video encoder (e.g., video encoder 20) may perform AMVP operations 210.

After the video encoder begins AMVP operation 210, the video encoder may generate one or more motion vectors (211) for the current PU. The video encoder may perform integer motion estimation and fractional motion estimation to generate motion vectors for the current PU. As described above, the current image can be associated with two reference image lists (List 0 and List 1). If the current PU is unidirectionally predicted, the video encoder may generate a list 0 motion vector or a list 1 motion vector for the current PU. The list 0 motion vector may indicate a spatial displacement between an image block of the current PU and a reference image block in the reference image in list 0. The List 1 motion vector may indicate a spatial displacement between an image block of the current PU and a reference image block in the reference image in List 1. If the current PU is bi-predicted, the video encoder may generate a list 0 motion vector and a list 1 motion vector for the current PU.

After generating one or more motion vectors for the current PU, the video encoder may generate a predictive image block for the current PU (212). The video encoder may generate a predictive image block for the current PU based on one or more reference image blocks indicated by one or more motion vectors for the current PU.

Additionally, the video encoder may generate a list of candidate predicted motion vectors for the current PU (213). The video decoder can generate a list of candidate predicted motion vectors for the current PU in various ways. For example, the video encoder may generate a candidate predicted motion vector list for the current PU in accordance with one or more of the possible implementations described below with respect to Figures 8-12. In some possible implementations, when the video encoder generates a candidate prediction motion vector list in AMVP operation 210, the candidate prediction motion vector list may be limited to two candidate prediction motion vectors. In contrast, when the video encoder generates a candidate prediction motion vector list in a merge operation, the candidate prediction motion vector list may include more candidate prediction motion vectors (eg, five candidate prediction motion vectors).

After generating the candidate predicted motion vector list for the current PU, the video encoder may generate one or more motion vector differences for each candidate predicted motion vector in the candidate predicted motion vector list (214). The video encoder may generate a motion vector difference for the candidate predicted motion vector by determining a difference between the motion vector indicated by the candidate predicted motion vector and the corresponding motion vector of the current PU.

If the current PU is unidirectionally predicted, the video encoder may generate a single MVD for each candidate predicted motion vector. If the current PU is bi-predicted, the video encoder may generate two MVDs for each candidate predicted motion vector. The first MVD may indicate a difference between a motion vector of the candidate predicted motion vector and a list 0 motion vector of the current PU. The second MVD may indicate a difference between a motion vector of the candidate prediction motion vector and a list 1 motion vector of the current PU.

The video encoder may select one or more of the candidate predicted motion vectors from the candidate predicted motion vector list (215). The video encoder can select one or more candidate predicted motion vectors in various ways. For example, the video encoder may select a candidate predicted motion vector of the associated motion vector that matches the motion vector to be encoded with minimal error, which may reduce the number of bits needed to represent the motion vector difference for the candidate predicted motion vector.

After selecting one or more candidate predicted motion vectors, the video encoder may output one or more reference image indices for the current PU, one or more candidate predicted motion vector indices, and for one or more selected candidates One or more motion vector differences of the motion vectors are predicted (216).

In an example where the current picture is associated with two reference picture lists (List 0 and List 1) and the current PU is unidirectionally predicted, the video encoder may output a reference picture index ("ref_idx_10") for list 0 or for The reference image index of List 1 ("ref_idx_11"). The video encoder may also output a candidate predicted motion vector index ("mvp_10_flag") indicating the location of the selected candidate predicted motion vector for the list 0 motion vector of the current PU in the candidate predicted motion vector list. Alternatively, the video encoder may output a candidate predicted motion vector index ("mvp_11_flag") indicating the location of the selected candidate predicted motion vector for the list 1 motion vector of the current PU in the candidate predicted motion vector list. The video encoder may also output an MVD for the list 0 motion vector or list 1 motion vector of the current PU.

In an example where the current picture is associated with two reference picture lists (List 0 and List 1) and the current PU is bi-predicted, the video encoder may output a reference picture index ("ref_idx_10") for list 0 and for the list Reference image index of 1 ("ref_idx_11"). The video encoder may also output a candidate predicted motion vector index ("mvp_10_flag") indicating the location of the selected candidate predicted motion vector for the list 0 motion vector of the current PU in the candidate predicted motion vector list. Additionally, the video encoder may output a candidate predicted motion vector index ("mvp_11_flag") indicating the location of the selected candidate predicted motion vector for the list 1 motion vector of the current PU in the candidate predicted motion vector list. The video encoder may also output an MVD for the list 0 motion vector of the current PU and an MVD for the list 1 motion vector of the current PU.

7 is an exemplary flow diagram of motion compensation performed by a video decoder (e.g., video decoder 30) in an embodiment of the present application.

When the video decoder performs motion compensation operation 220, the video decoder may receive an indication of the selected candidate predicted motion vector for the current PU (222). For example, the video decoder may receive a candidate predicted motion vector index indicating a location of the selected candidate predicted motion vector within the candidate PU motion vector list for the current PU.

If the motion information of the current PU is encoded using the AMVP mode and the current PU is bi-predicted, the video decoder may receive the first candidate predicted motion vector index and the second candidate predicted motion vector index. The first candidate predicted motion vector index indicates the location of the selected candidate predicted motion vector for the list 0 motion vector of the current PU in the candidate predicted motion vector list. The second candidate prediction motion vector index indicates the position of the selected candidate prediction motion vector for the list 1 motion vector of the current PU in the candidate prediction motion vector list. In some possible implementations, a single syntax element can be used to identify two candidate predicted motion vector indices.

Additionally, the video decoder may generate a list of candidate predicted motion vectors for the current PU (224). The video decoder can generate this candidate predicted motion vector list for the current PU in various ways. For example, the video decoder may generate a candidate predicted motion vector list for the current PU using the techniques described below with reference to Figures 8-12. When the video decoder generates a temporal candidate prediction motion vector for the candidate prediction motion vector list, the video decoder may explicitly or implicitly set a reference image index identifying the reference image including the co-located PU, as previously described Figure 5 depicts.

After generating the candidate predicted motion vector list for the current PU, the video decoder may determine the current PU based on the motion information indicated by one or more selected candidate predicted motion vectors in the candidate PU vector list for the current PU. Motion information (225). For example, if the motion information of the current PU is encoded using the merge mode, the motion information of the current PU may be the same as the motion information indicated by the selected candidate motion vector. If the motion information of the current PU is encoded using the AMVP mode, the video decoder may use one or more motion vectors indicated by the or the selected candidate prediction motion vector and one or more MVDs indicated in the code stream To reconstruct one or more motion vectors of the current PU. The reference image index and the prediction direction indicator of the current PU may be the same as the reference image index and the prediction direction indicator of the one or more selected candidate prediction motion vectors. After determining the motion information for the current PU, the video decoder may generate a predictive image block for the current PU based on the one or more reference image blocks indicated by the motion information for the current PU (226).

8 is an exemplary schematic diagram of a coding unit and associated location image blocks associated therewith, illustrating a schematic diagram of CU 250 and illustrative candidate prediction motion vector locations 252A through 252E associated with CU 250. The present application may collectively refer to candidate predicted motion vector locations 252A through 252E as candidate predicted motion vector locations 252. The candidate predicted motion vector position 252 represents a spatial candidate predicted motion vector in the same image as the CU 250. The candidate predicted motion vector location 252A is located to the left of CU250. The candidate predicted motion vector location 252B is located above the CU 250. The candidate predicted motion vector position 252C is located at the upper right of the CU 250. The candidate predicted motion vector position 252D is located at the lower left of the CU 250. The candidate predicted motion vector position 252E is located at the upper left of the CU 250. 8 is an illustrative implementation to provide a way in which the inter prediction module 121 and the motion compensation module 162 can generate a list of candidate predicted motion vectors. Embodiments will be explained below with reference to inter prediction module 121, but it should be understood that motion compensation module 162 can implement the same techniques and thus generate the same candidate prediction motion vector list.

FIG. 9 is an exemplary flowchart of constructing a candidate prediction motion vector list in the embodiment of the present application. The technique of FIG. 9 will be described with reference to a list including five candidate predicted motion vectors, but the techniques described herein may also be used with lists of other sizes. The five candidate predicted motion vectors may each have an index (eg, 0 to 4). The technique of FIG. 9 will be described with reference to a general video decoder. A typical video decoder may illustratively be a video encoder (e.g., video encoder 20) or a video decoder (e.g., video decoder 30).

To reconstruct a candidate predicted motion vector list in accordance with the embodiment of Figure 9, the video decoder first considers four spatial candidate predicted motion vectors (902). The four spatial candidate prediction motion vectors may include candidate prediction

motion vector positions

252A, 252B, 252C, and 252D. The four spatial candidate prediction motion vectors correspond to motion information of four PUs in the same image as the current CU (eg, CU 250). The video decoder may consider four spatial candidate prediction motion vectors in the list in a particular order. For example, the candidate predicted motion vector location 252A can be considered first. If the candidate predicted motion vector location 252A is available, the candidate predicted motion vector location 252A may be assigned to index 0. If the candidate predicted motion vector location 252A is not available, the video decoder may not include the candidate predicted motion vector location 252A in the candidate predicted motion vector list. Candidate predicted motion vector locations may not be available for a variety of reasons. For example, if the candidate predicted motion vector location is not within the current image, the candidate predicted motion vector location may not be available. In another possible implementation, if the candidate predicted motion vector location is intra predicted, the candidate predicted motion vector location may not be available. In another possible implementation, if the candidate predicted motion vector location is in a different strip than the current CU, the candidate predicted motion vector location may not be available.

After considering the candidate predicted motion vector location 252A, the video decoder may next consider the candidate predicted motion vector location 252B. If the candidate predicted motion vector location 252B is available and different than the candidate predicted motion vector location 252A, the video decoder may add the candidate predicted motion vector location 252B to the candidate predicted motion vector list. In this particular context, the terms "identical" and "different" refer to motion information associated with candidate predicted motion vector locations. Therefore, two candidate predicted motion vector positions are considered identical if they have the same motion information, and are considered different if they have different motion information. If the candidate predicted motion vector location 252A is not available, the video decoder may assign the candidate predicted motion vector location 252B to index 0. If the candidate predicted motion vector location 252A is available, the video decoder may assign the candidate predicted motion vector location 252 to index 1. If the candidate predicted motion vector location 252B is not available or the same as the candidate predicted motion vector location 252A, the video decoder skips the candidate predicted motion vector location 252B and does not include it in the candidate predicted motion vector list.

The candidate predicted motion vector location 252C is similarly considered by the video decoder for inclusion in the list. If the candidate predicted motion vector location 252C is available and not the same as the candidate predicted

motion vector locations

252B and 252A, the video decoder assigns the candidate predicted motion vector location 252C to the next available index. If the candidate predicted motion vector location 252C is not available or is not different than at least one of the candidate predicted

motion vector locations

252A and 252B, the video decoder does not include the candidate predicted motion vector location 252C in the candidate predicted motion vector list. Next, the video decoder considers the candidate predicted motion vector location 252D. If the candidate predicted motion vector location 252D is available and not the same as the candidate predicted

motion vector locations

252A, 252B, and 252C, the video decoder assigns the candidate predicted motion vector location 252D to the next available index. If the candidate predicted motion vector location 252D is not available or is not different than at least one of the candidate predicted

motion vector locations

252A, 252B, and 252C, the video decoder does not include the candidate predicted motion vector location 252D in the candidate predicted motion vector list. The above embodiments generally describe exemplarily considering candidate prediction motion vectors 252A through 252D for inclusion in a candidate prediction motion vector list, but in some implementations, all candidate prediction motion vectors 252A through 252D may first be added to candidates. The motion vector list is predicted, and the repetition is removed from the candidate prediction motion vector list later.

After the video decoder considers the first four spatial candidate prediction motion vectors, the candidate prediction motion vector list may include four spatial candidate prediction motion vectors or the list may include less than four spatial candidate prediction motion vectors. If the list includes four spatial candidate prediction motion vectors (904, YES), the video decoder considers the temporal candidate prediction motion vectors (906). The temporal candidate prediction motion vector may correspond to motion information of a co-located PU that is different from the image of the current image. If the temporal candidate prediction motion vector is available and different from the first four spatial candidate prediction motion vectors, the video decoder assigns the temporal candidate prediction motion vector to index 4. If the temporal candidate prediction motion vector is not available or identical to one of the first four spatial candidate prediction motion vectors, the video decoder does not include the temporal candidate prediction motion vector in the candidate prediction motion vector list. Thus, after the video decoder considers the temporal candidate prediction motion vector (906), the candidate prediction motion vector list may include five candidate prediction motion vectors (the first four spatial candidate prediction motion vectors considered at block 902 and considered at block 904) The temporal candidate predictive motion vector) or may include four candidate predicted motion vectors (the first four spatial candidate predicted motion vectors considered at block 902). If the candidate predicted motion vector list includes five candidate predicted motion vectors (908, YES), the video decoder completes the build list.

If the candidate predicted motion vector list includes four candidate predicted motion vectors (908, NO), the video decoder may consider the fifth spatial candidate predicted motion vector (910). The fifth spatial candidate prediction motion vector may, for example, correspond to the candidate predicted motion vector location 252E. If the candidate predicted motion vector at location 252E is available and different than the candidate predicted motion vector at

locations

252A, 252B, 252C, and 252D, the video decoder may add the fifth spatial candidate predicted motion vector to the candidate predicted motion vector list, The five-space candidate prediction motion vector is assigned to index 4. If the candidate predicted motion vector at location 252E is not available or is not different than the candidate predicted motion vector at candidate predicted

motion vector locations

252A, 252B, 252C, and 252D, the video decoder may not include the candidate predicted motion vector at location 252 Candidate predictive motion vector list. Thus after considering the fifth spatial candidate prediction motion vector (910), the list may include five candidate prediction motion vectors (the first four spatial candidate prediction motion vectors considered at block 902 and the fifth spatial candidate prediction motion considered at block 910) Vector) or may include four candidate predicted motion vectors (the first four spatial candidate predicted motion vectors considered at block 902).

If the candidate predicted motion vector list includes five candidate predicted motion vectors (912, YES), the video decoder finishes generating the candidate predicted motion vector list. If the candidate predicted motion vector list includes four candidate predicted motion vectors (912, No), the video decoder adds the artificially generated candidate predicted motion vectors (914) until the list includes five candidate predicted motion vectors (916, YES).

If the list includes less than four spatial candidate prediction motion vectors after the video decoder considers the first four spatial candidate prediction motion vectors (904, NO), the video decoder may consider the fifth spatial candidate prediction motion vector (918). The fifth spatial candidate prediction motion vector may, for example, correspond to the candidate predicted motion vector location 252E. If the candidate prediction motion vector at location 252E is available and different than the candidate prediction motion vector already included in the candidate prediction motion vector list, the video decoder may add the fifth spatial candidate prediction motion vector to the candidate prediction motion vector list, The five spatial candidate prediction motion vectors are assigned to the next available index. If the candidate predicted motion vector at location 252E is not available or is not different from one of the candidate predicted motion vectors that have been included in the candidate predicted motion vector list, the video decoder may not include the candidate predicted motion vector at location 252E Candidate predictive motion vector list. The video decoder may then consider the temporal candidate prediction motion vector (920). If the temporal candidate prediction motion vector is available and different from the candidate prediction motion vector that has been included in the candidate prediction motion vector list, the video decoder may add the temporal candidate prediction motion vector to the candidate prediction motion vector list, the temporal candidate The predicted motion vector is assigned to the next available index. If the temporal candidate prediction motion vector is not available or is not different from one of the candidate prediction motion vectors that have been included in the candidate prediction motion vector list, the video decoder may not include the temporal candidate prediction motion vector in the candidate prediction motion vector List.

If after considering the fifth spatial candidate prediction motion vector (block 918) and the temporal candidate prediction motion vector (block 920), the candidate predicted motion vector list includes five candidate predicted motion vectors (922, YES), then the video decoder completes generation A list of candidate predicted motion vectors. If the candidate predicted motion vector list includes less than five candidate predicted motion vectors (922, No), the video decoder adds the artificially generated candidate predicted motion vectors (914) until the list includes five candidate predicted motion vectors (916, yes) until.

According to the techniques of the present application, an additional merge candidate prediction motion vector may be artificially generated after the spatial candidate prediction motion vector and the temporal candidate prediction motion vector to fix the size of the merge candidate prediction motion vector list to a specified number of merge candidate prediction motion vectors (eg, Five of the possible embodiments of Figure 9 above. The additional merge candidate prediction motion vector may include an exemplary combined bi-predictive merge candidate motion vector (candidate motion vector 1), a scale bi-predictive merge candidate motion vector (candidate motion vector 2), and a zero vector Merge/AMVP candidate prediction motion vector (candidate prediction motion vector 3).

FIG. 10 is an exemplary schematic diagram of adding a combined candidate motion vector to a merge mode candidate prediction motion vector list in an embodiment of the present application. The combined bi-predictive merging candidate prediction motion vector may be generated by combining the original merging candidate prediction motion vector. In particular, two candidate predicted motion vectors (which have mvL0 and refIdxL0 or mvL1 and refIdxL1) in the original candidate prediction motion vector may be used to generate a bidirectional predictive merge candidate prediction motion vector. In FIG. 10, two candidate prediction motion vectors are included in the original merge candidate prediction motion vector list. The prediction type of one candidate prediction motion vector is list 0 unidirectional prediction, and the prediction type of another candidate prediction motion vector is list 1 unidirectional prediction. In this possible implementation, mvL0_A and ref0 are picked up from list 0, and mvL1_B and ref0 are picked up from list 1, and then a bidirectional predictive merge candidate predictive motion vector (which has mvL0_A and ref0 in list 0 and mvL1_B and ref0) in Listing 1 and check if it is different from the candidate predicted motion vector that has been included in the candidate predicted motion vector list. If they are different, the video decoder may include the bi-predictive merge candidate motion vector for inclusion in the candidate motion vector list.

FIG. 11 is an exemplary schematic diagram of adding a scaled candidate motion vector to a merge mode candidate prediction motion vector list in an embodiment of the present application. The scaled bi-predictive merge candidate prediction motion vector may be generated by scaling the original merge candidate prediction motion vector. In particular, a candidate predicted motion vector (which may have mvLX and refIdxLX) from the original candidate predicted motion vector may be used to generate a bi-predictive merge candidate predictive motion vector. In a possible implementation of Figure 11, two candidate predicted motion vectors are included in the original merge candidate predictive motion vector list. The prediction type of one candidate prediction motion vector is list 0 unidirectional prediction, and the prediction type of another candidate prediction motion vector is list 1 unidirectional prediction. In this possible implementation, mvL0_A and ref0 may be picked up from list 0, and ref0 may be copied to reference index ref0' in list 1. Next, mvL0'_A can be calculated by scaling mvL0_A having ref0 and ref0'. The scaling can depend on the POC distance. Next, a bidirectional predictive merge candidate prediction motion vector (which has mvL0_A and ref0 in list 0 and mvL0'_A and ref0' in list 1) can be generated and checked for repetition. If it is not a duplicate, it can be added to the merge candidate prediction motion vector list.

FIG. 12 is an exemplary schematic diagram of adding a zero motion vector to a merge mode candidate motion vector list in the embodiment of the present application. Zero Vector Merging Candidate Prediction Motion Vectors can be generated by combining a zero vector with a reference index that can be referenced. If the zero vector candidate prediction motion vector is not repeated, it may be added to the merge candidate prediction motion vector list. For each generated merge candidate predicted motion vector, the motion information may be compared to the motion information of the previous candidate predicted motion vector in the list.

In a feasible implementation manner, if the newly generated candidate prediction motion vector is different from the candidate prediction motion vector that has been included in the candidate prediction motion vector list, the generated candidate prediction motion vector is added to the merge candidate prediction motion vector. List. The process of determining whether the newly generated candidate predicted motion vector is different from the candidate predicted motion vector that has been included in the candidate predicted motion vector list is sometimes referred to as pruning. By cropping, each newly generated candidate predicted motion vector can be compared to an existing candidate predicted motion vector in the list. In some possible implementations, the pruning operation may include comparing one or more new candidate prediction motion vectors with candidate prediction motion vectors that have been in the candidate prediction motion vector list and candidates that are not added as candidates in the candidate prediction motion vector list A new candidate prediction motion vector that predicts the repetition of the motion vector. In other possible implementations, the pruning operation can include adding one or more new candidate prediction motion vectors to the candidate prediction motion vector list and later removing the duplicate candidate prediction motion vectors from the list.

JVET-F1001-v2 describes an improved technique for inter-frame coding in Section 2.3. Compared with the above-mentioned embodiments of the present application, an alternative temporal motion vector prediction (ATMVP) and space-time domain motion are introduced. A method of predicting a plurality of interframes, such as a spatial-temporal motion vector prediction (STMVP). It should be understood that the predicted motion vector obtained by the above method (ATMVP, STMVP or other methods described in Section 2.3) may be used as the Merge candidate motion vector list, AMVP candidate motion vector list or other candidate motion vector list in the above. Candidate prediction motion vector in . When the block to be processed has a plurality of available candidate motion vectors available, the encoder needs an indication to inform the decoder which candidate motion vector is used as the actual motion vector to reconstruct the block to be processed. Therefore, each candidate predicted motion vector corresponds to an index value, or a similar identifier. Each index value corresponds to a binarized representation, or a binarized character string. The binarized representation of the index value of the actual predicted motion vector is the indication information that needs to be transmitted from the encoding end to the decoding end. Using a reasonable binarization strategy to encode the index value can save coding bits and improve coding efficiency. Exemplarily, each candidate prediction motion vector has a certain probability of being selected as the actual prediction motion vector at the encoding end, and a shorter binarized string for the index value of the candidate motion vector with high probability. For the codeword), the index value of the candidate motion vector with a small probability is a longer binarized string, which can save coding bits.

Specifically, for example, there are three candidate prediction motion vectors, which are index 0, index 1, and index 2. The predicted motion vectors that are actually selected by a group of to-be-processed blocks are index 0 and index 1, respectively. Index 1, index 1, index 1, index 0, index 2, index 1, then if the index value of the shorter binarized string is used according to the index value of the candidate motion vector for the large probability, the index 1 corresponds to "1", index 0 corresponds to "00", index 2 corresponds to "01", obviously the length of the binary string of "00" or "01" is 2, and the length of the binary string of "1" is 1. , the binarized strings required to encode the above set of predicted motion vectors are "00", "1", "1", "1", "1", "00", "01", "1", total The length is 11; if the index value is encoded according to the opposite strategy, index 2 corresponds to "1", index 0 corresponds to "00", index 1 corresponds to "01", then the binarized string required to encode the above set of predicted motion vectors is "00", "01", "01", "01", "01", "00", "1", 01 ", for a total length of 15. Therefore, the shorter the binarized string required to encode the index value by the strategy of using the shorter binarized string according to the index value of the candidate motion vector for the large probability, the general need to encode the binarized string The number of bits is also small.

The embodiment of the present application aims to: when the to-be-processed block has a plurality of candidate prediction motion vectors, the similarity between the reference image blocks indicated by the candidate prediction motion vector of the to-be-processed block and the to-be-processed block is a priori knowledge. Assist in determining the coding mode of the identifiers of the candidate prediction motion vectors, thereby saving coding bits and improving coding efficiency. In a feasible implementation manner, since the pixel value of the to-be-processed block cannot be directly obtained at the decoding end, the similarity is similar to the reconstructed pixel set corresponding to the block to be processed and the reconstructed pixel set corresponding to the reference image block. The degree is similar, that is, the similarity between the reconstructed pixel set around the block to be processed and the reconstructed pixel set corresponding to the reference image block is used to represent the reference image indicated by the candidate predicted motion vector of the block to be processed and the block to be processed. The similarity between blocks.

Therefore, it should be understood that, in the example of the encoding end, the embodiment of the present application is applicable to a scenario in which a reference image block is determined from a plurality of reference image blocks of a block to be processed, and the identification information of the reference image block is encoded. And the plurality of reference image blocks are derived from an inter-frame type prediction mode, from an intra-frame type prediction mode, or from a viewpoint-to-view mode (Multi-view or 3D Video Codig). It is also independent of the prediction mode (scalable video coding, Scalabe Video Coding) from the layer, regardless of the specific reference image block acquisition method (such as using ATMVP or STMVP, or intra block copy mode), and indicating the reference image block. The motion vector is a motion vector belonging to the entire coding unit, and is not related to the motion vector of a certain coding unit in the coding unit. The foregoing various prediction modes consistent with the applicable scene of the embodiment of the present application and a method for acquiring a reference image block (ie, acquiring motion) The method of the vector can achieve the technical effect of improving the coding efficiency according to or in combination with the solution in the embodiment of the present application.

FIG. 13 is a schematic flowchart of an encoding method 1000 according to an embodiment of the present application. As described above, the embodiments of the present application are applicable to various application scenarios. For the sake of simplicity of description, the acquisition modes of each candidate reference image block of the to-be-processed block are respectively referred to as mode 1, mode 2, mode 3, etc., and the foregoing acquisition manners. It includes different prediction methods, such as ATMVP and STMVP, as well as different operations using the same prediction method, such as obtaining the motion vector of the left neighboring block in the Merge mode and obtaining the motion vector of the upper neighboring block in the Merge mode. For different acquisition methods, and use different modes to represent. Let each mode correspond to a motion vector and correspond to an identification value. It should be understood that the above motion vector includes both the motion vector used in the traditional inter prediction, and the displacement vector (in the same frame) used to represent the block to be processed and the reference image block when motion estimation is used in intra prediction, and Including inter-view prediction, a vector used to characterize the matching relationship between views, and a vector used to represent the matching relationship between different layers when inter-layer prediction is used, collectively referred to as a motion vector, for obtaining a reference image block of a block to be processed. Each motion vector also corresponds to one reference frame information, and the reference image block indicated by the motion vector comes from the reference frame information. Different application scenarios refer to frame information in different representations. For example, in the inter prediction mode, reference frame information may be used to represent a reconstructed time domain reference frame, such as acquiring a left neighboring block in the Merge mode. The motion vector also needs to acquire the reference frame information of the left neighboring block, and the corresponding reference image block is determined according to the motion vector in the reference frame determined by the reference frame information. In the intra prediction mode, in general, the reference frame is the current frame. In this scenario, the reference frame information can be omitted. In multi-view coding, the reference frame information can in turn be used to represent reconstructed frames of different views at different times or at the same time. In scalable coding, the reference frame information can in turn be used to represent reconstructed frames of different layers that have been reconstructed at different times or at the same time. The reference frame information may be an index value or a 0 or 1 flag according to the application scenario. The identifier value corresponding to each mode is used to distinguish various modes, and may be an index value or an identity identifier, which is not limited. Illustratively, the following correspondence may be established to facilitate the description of the subsequent scheme.

Table 1

It should also be understood that the plurality of candidate prediction motion information may constitute a set, may exist in the form of a list, may exist in the form of a list of complements, or may exist in the form of a subset, and is not limited.

As shown in FIG. 13 , an encoding method 1000 for predicting motion information of an image block to be processed according to an embodiment of the present application includes:

S1001. Acquire N candidate predicted motion information of the image block to be processed.

Where N is an integer greater than one. The N candidate predicted motion informations are different from each other. It should be understood that when the motion information includes the motion vector and the reference frame information, the motion information is different from each other, and also includes the case where the motion vectors are the same but the reference frame information is different. The technique of pruning has been introduced in the foregoing, and it should be understood that in the process of obtaining N candidate prediction motion information of an image block to be processed, a pruning operation is performed, so that the N candidate motion information obtained finally is different from each other. No longer.

In a feasible implementation manner 10011, the acquiring the N candidate prediction motion information of the to-be-processed image block includes: acquiring, in a preset order, N different from the image block to be processed, and having a preset The motion information of the image block of the positional relationship is used as the N candidate predicted motion information.

For example, the candidate prediction mode of the Merge mode specified in the H.265 standard described above is determined by acquiring a predetermined spatial domain position relationship with the image block to be processed in a certain order (for example, 252A, 252B, 252C, 252D). And 252E) and/or motion information of an image block of a preset time domain position relationship (for example, a co-located position), and by trimming, finally obtain N candidate prediction motion information that are different from each other.

In a feasible implementation manner 10012, the acquiring the N candidate predicted motion information of the to-be-processed image block includes: acquiring, in a preset order, M different presets and the preset image block to have a preset The motion information of the image block of the positional relationship is the M candidate prediction motion information, wherein the M candidate prediction motion information includes the N candidate prediction motion information, and M is an integer greater than N; determining the M candidate predictions a grouping manner of motion information; determining, according to the grouping manner, the N candidate prediction motion information from the M candidate prediction motion information.

Exemplarily, as shown in Table 1, according to the method of Embodiment 10011, 7 kinds of candidate predicted motion information of the motion information 0 to the motion information 6 are obtained, and M is 7. According to the predicted fractional formula, the above seven candidates are grouped according to the predicted motion information. In one possible implementation 100121, all seven candidate prediction motion information may be used as one packet, that is, the same as embodiment 10012. In a feasible implementation manner 100122, the seven candidate prediction motion information may be grouped according to a preset number interval, for example, grouped according to the first three, the middle three, and the last one, or the first two The middle 3 or the last 2 are grouped, or the first 3 and the last 4 are grouped, and the number of divided groups and the number of candidate predicted motion information included in each group are not limited. In a feasible implementation manner 100123, the candidate prediction motion information acquired according to the spatial neighboring block may be grouped according to the acquisition of the motion information, and the candidate prediction motion obtained according to the time domain neighboring block may be classified into a group. Information is grouped together. In a possible implementation manner, the candidate prediction motion information includes motion information of a coding unit level (CU level) and motion information of a sub-CU level. For a specific acquisition fraction, refer to JVET-F1001- Section 2.3.1 of v2 is based on motion vector prediction of sub-coding units. If not described, it can be grouped according to CU level and sub-CU level. Specifically, it may be assumed that the seven candidate predicted motion information is divided into two groups, the first group is motion information 0-2, and the second group is motion information 3-6.

For each grouping, different processing methods may be adopted, and the same processing manner may be adopted, which is not limited. For example, the index value of the candidate prediction motion vector in the Merge mode specified in the H.265 standard described above may be assigned a binary value for the index 0-2 corresponding to the motion information 0-2 of the first group. The character string (hereinafter referred to as the conventional mode), that is, the index 0-2 is given a binarized character string in the order of prediction, and the method described in the embodiment S1002-S1005 is the second group of motions. The index 3-6 corresponding to the information 3-6 is given to the binarized character string (the manner of which is a brief description of the embodiment of the present application). It is also possible to assign a binarized character string to the first group and the second group in accordance with the method described in the embodiment S1002-S1005 of the present application. However, it should be understood that the binarized character string assigned is two kinds of information that simultaneously characterizes the group identifier of the corresponding motion information and the identifier of the motion information in the group, so the binarized string can be used. Any mode in a group distinguishes all of the other six modes.

It may be noted that the motion information 0-2 of the first group is processed in a conventional manner (exemplary, the index 0 corresponds to the binarized character string “0”, the index 1 corresponds to the binarized character string “10”, and the index 2 corresponds to two. The value string "110" is processed, and the motion information 3-6 of the second group is processed in the manner of the embodiment of the present application, where N is 4.

When M is greater than N, that is, when the various packet embodiments described above are employed, in one possible implementation 10013, the packet mode is programmed into the code stream. For example, various references of the group (such as the implementation manners 100121-100124) may be assigned a number, the number may be encoded into the code stream, or the number of groups, the number of candidate prediction motion information included in each group, etc. may be coded. Flow, no limit. The grouping manner of the coded stream enables the decoding end to know the grouping manner of the encoding end. In another possible implementation manner 10014, the grouping mode is that the codec end is respectively fixed at the codec end by a preset protocol and is consistent, so that the packet mode does not need to be coded into the code stream.

At the same time, the encoding end also needs to make the decoding end know the specific candidate prediction motion information. In a feasible implementation manner 10015, candidate prediction motion information is encoded into a code stream; or second identification information indicating an image block having a preset positional relationship with the image block to be processed is encoded into the code stream. Specifically, an image block (such as a spatial neighboring block, a time domain co-located block, and the like in the Merge mode) from which the candidate predicted motion information is taken and having a preset positional relationship is assigned a number, and the number is assigned. Encoding the code stream; or, the third identification information having a preset correspondence relationship with the N candidate prediction motion information is encoded into the code stream, and specifically, a combination of preset candidate motion information is preset The combination is assigned a number, and the number is programmed into the code stream. In another possible implementation manner 10016, the candidate prediction motion information is that the codec end is respectively fixed at the codec end by a preset protocol, so that the candidate prediction motion information does not need to be encoded into the code stream.

In a possible implementation 10017, after determining the grouping mode, the encoding end also needs to make the decoding end know the processing manner of each packet. For example, before the subsequent steps are performed, each packet is coded with a 0 or 1 identifier to indicate that the current packet is processed in a conventional manner or in the manner of the embodiment of the present application; or The syntax element includes a representation of the processing manner of each packet; or, the codec may be separately fixed at the codec by a preset protocol, for example, the first packet is agreed to be processed in a conventional manner, and the second packet is used in the second packet. The manner of the embodiment of the present application is processed, or it is agreed that the processing is performed according to the embodiment of the present application when there is only one packet, which is not limited.

S1002: Determine an adjacent reconstructed image block of the image block to be processed.

It should be understood that the adjacent reconstructed image blocks of the image block to be processed according to different application scenarios may include: a spatially adjacent reconstructed image block that is located in the same frame image as the image block to be processed, and the image block to be processed is located in a different frame image. The time-domain adjacent reconstructed image block of the same position, and the reconstructed image block of the adjacent view point of the same position of the different frame image at the same time and the image block to be processed are the same as the different frame images of the image block to be processed at the same time Reconstructed image blocks and the like of adjacent layers of the position are not limited.

Adjacent reconstructed image blocks are available, i.e., adjacent reconstructed image blocks can be used by current methods. Illustratively, when the left boundary of the image block to be processed is not an image boundary, the left neighboring block of the image block to be processed is available; when the upper boundary of the image block to be processed is not the image boundary, the upper adjacent block of the image block to be processed Available. In some cases, whether adjacent reconstructed image blocks are available is further related to the configuration of other encoding tools. Exemplarily, even if the left boundary of the image block to be processed is not an image boundary, if the left boundary is a boundary of an image block group, such as a boundary of a slice, a tile, or the like, according to the image block group and the left side The independence relationship between adjacent image block groups, the left adjacent block of the image block to be processed still exists unusable (corresponding to the case where the image groups are completely independent). In another opposite example, even if the left boundary of the image block to be processed is an image boundary, but other encoding tools are configured to interpolate the image padding outside the image boundary, the left adjacent block of the image block to be processed Available.

In a possible implementation 10021, when the adjacent reconstructed image block of the image block to be processed includes at least two of the original adjacent reconstructed image blocks, the determining the phase of the image block to be processed The neighbor reconstructed image block is usable, comprising: determining that at least one of the at least two of the original adjacent reconstructed image blocks is available. Illustratively, when the original adjacent reconstructed image block of the image block to be processed includes the upper adjacent reconstructed image block and the left adjacent reconstructed image block, the upper adjacent reconstructed image block and the left adjacent reconstructed image block are If any one of the reconstructed image blocks is available, it is determined that adjacent reconstructed image blocks of the image block to be processed are available. The original adjacent reconstructed image block is used to refer to the adjacent reconstructed image block of the image block to be processed to distinguish the reference adjacent reconstructed image block mentioned in the following text, and is used to refer to the adjacent reconstructed image block. An adjacent reconstructed image block of a reference image block indicated by the candidate predicted motion information to be processed by the image block to be processed.

It should be understood that when the adjacent reconstructed image block of the image block to be processed is not available, the method of the embodiment of the present invention cannot utilize the similarity of the reconstructed pixel set corresponding to the block to be processed and the reconstructed pixel set corresponding to the reference image block. A similarity between the reference image blocks indicated by the candidate prediction motion vectors of the to-be-processed block and the to-be-processed block is characterized. In some embodiments, the identification information is required to encode the auxiliary information such as the foregoing grouping manner and/or the processing manner of each packet. In such an embodiment, the adjacent reconstructed image block of the image block to be processed may also be determined first. The availability, when adjacent reconstructed image blocks are not available, can be directly encoded in a conventional manner without further encoding the above auxiliary information, thereby saving coded bits.

S1003. Acquire a distortion value corresponding to each of the N candidate prediction motion information.

The distortion value is used to calculate a similarity between the reconstructed pixel set (original adjacent reconstructed image block) around the block to be processed and the reconstructed pixel set (refer to the adjacent reconstructed image block) corresponding to the reference image block, The distortion value is determined by the adjacent reconstructed image block of the reference image block indicated by the candidate predicted motion information and the adjacent reconstructed image block of the image block to be processed. As shown in FIG. 14, the reference adjacent reconstructed image block is identical in shape and equal in size to the original adjacent reconstructed image block, and between the reference adjacent reconstructed image block and the reference image block. The positional relationship is the same as the positional relationship between the original adjacent reconstructed image block and the image block to be processed.

The following describes the shape and size of several original adjacent reconstructed image blocks. The image block to be processed is rectangular. The width of the image block to be processed is W and the height is H. The reconstructed image block is a rectangle. In a possible implementation 10031, a lower boundary of the original adjacent reconstructed image block is adjacent to an upper boundary of the image block to be processed, and the width of the original adjacent reconstructed image block is W, high. In a feasible implementation 10032, a lower boundary of the original adjacent reconstructed image block is adjacent to an upper boundary of the image block to be processed, and a width of the original adjacent reconstructed image block is W+H, high is n; in a feasible implementation 10033, a right boundary of the original adjacent reconstructed image block is adjacent to a left boundary of the image block to be processed, the original adjacent reconstruction The image block has a width n and a height H. In a possible implementation 10034, a right boundary of the original adjacent reconstructed image block is adjacent to a left boundary of the image block to be processed, the original phase The adjacent reconstructed image block has a width n and a height W+H. Where W, H, and n are positive integers. It should be understood that the setting of the shape and size of the original adjacent reconstructed image block is related to the accuracy of the implementation complexity and the similarity estimation, and the codec end is consistent according to the protocol, and is not limited.

In a possible implementation 10035, according to the storage system requirements of the encoding system, the above n may be set to 1 or 2, so that no additional storage space is needed to store the original adjacent reconstructed image blocks, which simplifies the hardware implementation.

It should be understood that the reference adjacent reconstructed image block and the original adjacent reconstructed image block have the same shape, the same size, and the same positional relationship, and thus refer to the implementation of the adjacent reconstructed image block and the corresponding original adjacent reconstructed image. The blocks are exactly the same.

In S1003, it is first necessary to obtain reference adjacent reconstructed image blocks of reference image blocks of the image block to be processed indicated by the N candidate prediction motion information. The reference adjacent reconstructed image block of the reference image block of the image block to be processed may be determined according to the reference frame information and the motion vector characterized by the candidate predicted motion information according to the motion compensation method described above.

In a feasible implementation 10036, the motion vector in the candidate prediction motion information points to the position of the sub-pixel in the reference frame, and at this time, the image of the reference frame image or a part of the reference frame image needs to be interpolated with pixel precision. Obtain a reference image block. In this case, the 8-tap filter of {-1, 4, -11, 40, 40, -11, 4, -1} can be used for sub-pixel interpolation, or, in order to simplify the computational complexity, Sub-pixel interpolation is performed using a bilinear interpolation filter.

A difference characterization value of the reference neighboring reconstructed image block of the reference image block and the original adjacent reconstructed image block of the block to be processed is then calculated as a distortion value.

The difference characterization value can be calculated in a variety of ways, such as average absolute error, absolute error sum, sum of squared errors, sum of squared mean errors, absolute Hadamard transform error sum, normalized product correlation metric, or, based on sequential Similarity measures for similarity detection, and so on. Calculating the difference characterization value is to obtain the similarity (or matching degree) of the reference adjacent reconstructed image block of the reference image block and the original adjacent reconstructed image block of the corresponding to-be-processed block, and thus the calculation method for this purpose All of them are applicable to the embodiments of the present application, and are not limited.

When there are a plurality of original adjacent reconstructed image blocks, for example, it may be assumed that the plurality of the original adjacent reconstructed image blocks include a third original adjacent reconstructed image block and a fourth original adjacent reconstructed image. And corresponding to the plurality of the reference adjacent reconstructed image blocks, including the third reference adjacent reconstructed image block and the fourth reference adjacent reconstructed image block, wherein the distortion value is used by the reference phase And representing, by the difference representation value of the adjacent reconstructed image block and the original adjacent reconstructed image block, comprising: the distortion value is reconstructed by the third reference neighboring reconstructed image block and the third original neighboring block And a difference characterization value of the image block and a sum of difference characterization values of the fourth reference adjacent reconstructed image block and the fourth original adjacent reconstructed image block. More generally, the distortion value is obtained according to the following calculation formula:

(1)

In a possible implementation 10037, the embodiment of the present invention is applied to inter-frame bidirectional prediction. It may be provided that the reference image block indicated by the candidate prediction motion information includes a first reference image block and a second reference image block, corresponding to And the adjacent reconstructed image block of the reference image block indicated by the candidate predicted motion information includes a first reference adjacent reconstructed image block and a second reference adjacent reconstructed image block, correspondingly, the distortion value is Referring to the difference characterization values of the adjacent reconstructed image block and the original adjacent reconstructed image block, including: the distortion value is averaged by reference to the adjacent reconstructed image block and the original adjacent reconstructed image block Representing a difference characterization value, wherein the average reference neighboring reconstructed image block is obtained by calculating a pixel mean of the first reference neighboring reconstructed image block and the second reference neighboring reconstructed image block; or The distortion value is represented by an average of a first difference characterization value and a second difference characterization value, wherein the first difference characterization value is reconstructed by the first reference neighboring reconstructed image block and the original neighboring image image Characterizing said difference values to represent said second difference values of the adjacent reconstructed image characterizing the original block and the reconstructed neighboring characterizing the difference value of the image block is represented by the second reference.

In a possible implementation 10038, the image block to be processed has candidate prediction motion information at the sub-block level, as shown in FIG. 15, respectively, corresponding to each sub-block adjacent to the original adjacent reconstructed image block. The distortion value is summed as the distortion value of the image block to be processed. Specifically, for example, the corresponding reference image blocks Ref Sub-CU1 and Ref Sub-CU2 are respectively found according to the motion information of Sub-CU1 and Sub-CU2 in the to-be-processed block, and then the original reconstructed image blocks T1 and T2 are respectively determined. And the corresponding reference reconstructed image blocks T1', T2', and finally the distortion value is obtained according to the method shown in the formula (1).

S1004: Determine, according to the size relationship between the acquired N distortion values, first identification information of each of the N candidate prediction motion information, the N candidate prediction motion information and the respective first identification information. correspond.

When all the N candidate prediction motion informations obtain the corresponding distortion values according to the method in S1003, that is, N distortion values are acquired, the first identification information is a binarized representation of the identifier of each candidate prediction motion information, or Binary string.

In S1004, first, the size between the N distortion values is compared. Specifically, the N candidate prediction motion information may be sequentially arranged according to the distortion value from small to large or from large to small. It may be arranged that the N candidate prediction motion information is arranged in the order of the distortion values from small to large, that is, the distortion value corresponding to the candidate motion information that is ranked higher is smaller.

Then, the first identification information of each of the N candidate prediction motion information is given according to the comparison result, wherein the length of the binary character string of the first identification information of the candidate prediction motion information with the smaller distortion value is less than or equal to And a length of a binary character string encoding the first identification information of the candidate prediction motion information having the large distortion value.

Exemplarily, taking Table 1 as an example, the motion information 0-2 is divided into the first group, and the motion information 3-6 is divided into the second group.

When the first group is processed in the conventional manner, and the second group is processed in the manner of the embodiment of the present application, and the order of the distortion values from small to large is mode 6, mode 4, mode 5, and mode 3, the following exemplary index coding is performed. In this way, you should understand that the assignment of the binary string is only exemplary. You can also use other variable length encoding methods, which are not limited:

Table 2

When the first group is processed in the manner of the embodiment of the present application, and the order of distortion values from small to large is mode 0, mode 2, mode 1, and the second group is processed in a conventional manner, the following exemplary index coding modes are as follows:

table 3

When the first group is processed in the manner of the embodiment of the present application, and the order of the distortion values from the small to the largest is the mode 0, the mode 2, and the mode 1, the second group is processed in the manner of the embodiment of the present application, and the distortion value is from small to large. The order is mode 6, mode 4, mode 5, mode 3, then there are the following exemplary index coding methods:

Table 4

It should be understood that, among the plurality of packets into which the N candidate prediction motion information is divided, the latter group needs to consider the prior grouping when determining the binary representation manner of the identifier value of the candidate prediction motion information in the group, thereby avoiding the inability to Prior to grouping. It should also be understood that, in some embodiments, when the location of the packet and the number of candidate prediction motion information within the group are known by the decoding end by other means, the subsequent packet is determined in the identification value of the candidate prediction motion information within the group. In the case of binary representation, the prior grouping can be ignored.

S1005. When the target predicted motion information of the to-be-processed image block is one of the N candidate predicted motion information of the determined first identifier information, the first identifier information of the target predicted motion information is encoded into a code. flow.

The coding is performed by using each candidate prediction motion information, and the entire coding process may be simulated, or part of the coding process may be completed (for example, only the reconstructed image block is completed without entropy coding), and the coding cost of each candidate prediction motion information is obtained. The coding cost is obtained by calculating the degree of distortion of the reconstructed image block and/or the coding bits used to encode the image block. According to actual needs, selecting suitable candidate prediction motion information, for example, the candidate prediction motion information with the lowest coding cost described above, as the target motion information of the actual coding, and coding (such as an index value) according to the binary string determined in step S1004. Into the stream.

The above process of determining the target prediction motion information is generally referred to as obtaining the target prediction motion information by using a rate-distortion (RDO) criterion. The specific steps and various feasible simplified methods can be referred to the reference software of HM, JEM, etc. The implementation of the encoding side, do not go into details.

FIG. 16 is a schematic flowchart 1100 of a decoding method according to an embodiment of the present application. It should be understood that, in general, the decoding process is the inverse of the encoding process. In the encoding process, the syntax elements of the code stream are sequentially encoded, and the decoding end needs to be parsed in a corresponding order and position to complete the reconstruction of the video image at the decoding end. It is to be noted that the decoding method shown in FIG. 16 corresponds to the encoding method shown in FIG.

S1101: Determine a grouping manner of the M candidate prediction motion information.

It should be understood that step S1101 corresponds to a different embodiment as an optional step. For example, when the encoding end uses the decoding end to determine the grouping mode according to a pre-protocol manner, the decoding end can learn the grouping mode through the protocol, and the step does not exist at the time of actual execution. When the encoding end uses the transmission identification information to make the decoding end know the grouping mode, this step needs to be performed. The specific grouping mode and the encoding end are consistent. For details, refer to the grouping manner for determining the M candidate prediction motion information in step S1001; and determining the N from the M candidate prediction motion information according to the grouping manner. An exemplary description of the candidate predicted motion information will not be described again.

Further, corresponding to the encoding end, the decoding end needs to know the processing manner of the packet where each packet or target prediction motion information is located. Optionally, the decoding end may be known according to the protocol of the codec end, or the identification information in the code stream may be parsed. Know. That is, the preset grouping manner is determined, or the grouping manner is obtained by parsing from the code stream. For details, refer to the method described in Embodiment 10017, and details are not described herein.

S1102: Parse target identification information of target predicted motion information of the to-be-processed image block from the code stream.

Similarly, corresponding to the encoding end step S1005, the code stream can be parsed, and the identifier of the candidate predicted motion information actually used for encoding, that is, the binary string can be obtained.

S1103. Determine a neighboring reconstructed image block of the image block to be processed.

This step corresponds to step S1002 of the encoding end, and the content remains the same. The various possible implementations in S1002 can be used in S1103, and no further description is provided.

Wherein, when the adjacent reconstructed image block of the image block to be processed includes at least two of the original adjacent reconstructed image blocks, the adjacent reconstructed image block that determines the image block to be processed is available, including Determining that at least one of the at least two of the original adjacent reconstructed image blocks is available.

It should be understood that, corresponding to the encoding end, in some embodiments, the identification information is required to encode the auxiliary information such as the above-mentioned grouping manner and/or the processing manner of each packet. In such an embodiment, the processing may be determined first. The availability of adjacent reconstructed image blocks of an image block can be directly encoded in a conventional manner when adjacent reconstructed image blocks are not available, without further encoding the auxiliary information, thereby saving coding bits, corresponding to the decoding end, ie The associated auxiliary information is no longer parsed when the adjacent reconstructed image block is not available.

It should be understood that steps S1103 and S1102 have no mandatory sequence, and may be processed in parallel without limitation.

S1104: Determine N candidate predicted motion information.

The N candidate predicted motion information includes the target predicted motion information, where N is an integer greater than 1. Specifically, the step includes: acquiring N mutually different and the to-be-processed according to a preset sequence. The image block has motion information of an image block of a preset positional relationship as the N candidate predicted motion information. Or the step of: acquiring, according to a preset sequence, motion information of the M different image blocks having a preset positional relationship with the image block to be processed as the M candidate prediction motion information, where the M The candidate prediction motion information includes the N candidate prediction motion information, where M is an integer greater than N; and determining the N candidates from the M candidate prediction motion information according to the target identification information and the grouping manner Predict motion information.

This step corresponds to step S1001 of the encoding end, and the content remains the same. Various possible implementations in S1001 can be used in S1104, and no further description is provided.

Similarly, corresponding to the encoding end, the decoding end needs to know the specific candidate prediction motion information. Alternatively, it may be learned according to the pre-protocol method of the decoding end, or the candidate prediction motion information or the identification information in the parsing stream may be known. . That is, parsing the encoding information of the plurality of candidate prediction motion information in the code stream to obtain the N candidate prediction motion information; or parsing the second identification information in the code stream to obtain the a second candidate image block indicated by the second identifier information, and using the motion information of the N candidate image blocks as the N candidate predicted motion information; or parsing the third identifier information in the code stream to obtain The N candidate predicted motion information having a preset correspondence relationship with the third identification information. For details, refer to the embodiment 10015 and the embodiment 10016, that is, no further description is provided.

However, it should be noted that with respect to the encoding side, in some embodiments, the implementation of the decoding side may be somewhat different. In general, at the encoding end, all candidate prediction motion information may become the target prediction motion information actually used for encoding, and therefore, all candidate prediction motion information must be determined. At the decoding end, only the candidate predicted motion information capable of determining the target predicted motion information needs to be determined, and it is not necessary to determine all the candidate predicted motion information. In some application scenarios, this can reduce the complexity of the implementation of the decoder.

Exemplarily, for the embodiment corresponding to Table 1, if mode 0-1 is located in the first packet, mode 2-4 is located in the second packet, and mode 5-6 is located in the third packet, regardless of how the different modes are in the group The index value of the candidate predicted motion information is assigned to the binary string. It can be determined that the binary string corresponding to the first packet includes “1” and “01”, and the binary string corresponding to the second packet includes “001” and “0001”. And "00001", the binary string corresponding to the third group includes "000001" and "000000". When the identifier of the target coded motion information obtained by the parsing code stream is “001”, it can be known that the target motion vector of the target belongs to the second group, and therefore it is not necessary to determine the index value corresponding to the candidate motion information of the candidate in the first packet and the third packet. A binary string, but only a binary string corresponding to the index value of the candidate prediction motion information in the second packet needs to be determined.

S1105. Acquire a distortion value corresponding to each of the N candidate prediction motion information.

This step corresponds to step S1003 of the encoding end, and the content remains the same. Various possible implementation manners in S1003 can be used in S1105, and details are not described herein.

That is, in a feasible implementation manner 11051, the adjacent reconstructed image block of the reference image block indicated by the candidate predicted motion information includes a reference adjacent reconstructed image block, and the adjacent reconstructed image block is reconstructed. The image block includes an original adjacent reconstructed image block corresponding to the reference neighboring reconstructed image block, the distortion reconstructed image of the adjacent reconstructed image block of the reference image block indicated by the candidate predicted motion information Processing the adjacent reconstructed image block determination of the image block, comprising: the distortion value represented by a difference characterization value of the reference neighboring reconstructed image block and the original adjacent reconstructed image block, the reference neighbor Reconstructing an image block having the same shape and equal size as the original adjacent reconstructed image block, and a positional relationship between the reference adjacent reconstructed image block and the reference image block and the original adjacent reconstructed image The positional relationship between the block and the image block to be processed is the same.

In a possible implementation manner, the differential representation value of the reference adjacent reconstructed image block and the original adjacent reconstructed image block includes: the reference adjacent reconstructed image block and the original phase An average absolute error of the adjacent reconstructed image block; an absolute error sum of the reference adjacent reconstructed image block and the original adjacent reconstructed image block; the reference adjacent reconstructed image block and the original adjacent weight a sum of squared errors of the image blocks; a squared sum of average errors of the reference neighboring reconstructed image blocks and the original neighboring reconstructed image blocks; the reference neighboring reconstructed image blocks and the original neighboring reconstructed An absolute Hadamard transform error sum of the image block; a normalized product correlation metric of the reference neighboring reconstructed image block and the original neighboring reconstructed image block; or the reference neighboring reconstructed image block And a similarity measure based on sequential similarity detection of the original adjacent reconstructed image block.

In a possible implementation manner, the image block to be processed is a rectangle, the width of the image block to be processed is W, the height is H, and the original adjacent reconstructed image block is a rectangle. The lower boundary of the adjacent reconstructed image block is adjacent to the upper boundary of the image block to be processed, and includes: the original adjacent reconstructed image block has a width W and a height n; or the original adjacent reconstruction The image block has a width of W+H and a height of n; where W, H, and n are positive integers.

In a possible implementation, the first boundary of the original adjacent reconstructed image block is adjacent to the left boundary of the image block to be processed, and the width of the original adjacent reconstructed image block is n. The height is H; or, the original adjacent reconstructed image block has a width n and a height W+H.

In one possible implementation 11055, n is 1 or 2.

In a possible implementation manner, the reference image block indicated by the candidate predicted motion information includes a first reference image block and a second reference image block, correspondingly, the phase of the reference image block indicated by the candidate predicted motion information. The adjacent reconstructed image block includes a first reference adjacent reconstructed image block and a second reference adjacent reconstructed image block, correspondingly, the distortion value is determined by the reference adjacent reconstructed image block and the original adjacent weight Representing the difference characterization value of the image block, including: the distortion value is represented by an average reference neighboring reconstructed image block and a difference characterization value of the original adjacent reconstructed image block, wherein the average reference neighbor Reconstructing an image block is obtained by calculating a pixel mean of the first reference neighboring reconstructed image block and the second reference neighboring reconstructed image block; or the distortion value is represented by a first difference representation value and a second difference Representing a mean value of the characterization value, wherein the first difference characterization value is represented by the difference characterization value of the first reference neighboring reconstructed image block and the original neighboring reconstructed image block, Two difference table Values of the original block and the reconstructed image by the difference characterizing the adjacent second reference value adjacent to a reconstructed image block represented.

In a possible implementation 11057, the adjacent reconstructed image block of the reference image block indicated by the candidate predicted motion information includes a plurality of the reference adjacent reconstructed image blocks, and the plurality of the reference adjacent neighbors The reconstructed image block includes a third reference adjacent reconstructed image block and a fourth reference adjacent reconstructed image block, correspondingly, the adjacent reconstructed image block of the image block to be processed includes a plurality of the original adjacent weights Constructing an image block, the plurality of the original adjacent reconstructed image blocks comprising a third original adjacent reconstructed image block and a fourth original adjacent reconstructed image block, the distortion values being reconstructed by the reference neighbor And representing, by the difference characterization value of the image block and the original adjacent reconstructed image block, the distortion value is determined by the third reference neighboring reconstructed image block and the third original adjacent reconstructed image block And a difference characterization value and a sum of difference characterization values of the fourth reference neighboring reconstructed image block and the fourth original neighboring reconstructed image block.

In one possible implementation 11058, the distortion value is obtained according to the following calculation formula:

S1106: Determine, according to the magnitude relationship between the acquired N distortion values, first identification information of each of the N candidate prediction motion information, the N candidate prediction motion information and the respective first identification information. correspond.

This step corresponds to step S1004 of the encoding end, and the content remains the same. The various possible implementation manners in S1004 can be used in S1106, and no further description is provided.

That is, in a possible implementation manner, the first identifier information of the N candidate prediction motion information, the N candidates, is determined according to the size relationship between the acquired N distortion values. The one-to-one correspondence between the predicted motion information and the respective first identification information includes: comparing a size between the N distortion values; and assigning, according to the comparison result, first identification information of each of the N candidate prediction motion information, where And the length of the binary character string of the first identification information of the candidate prediction motion information with the smaller distortion value is less than the length of the binary character string of the first identification information of the candidate prediction motion information with the larger distortion value.

In a possible implementation 11062, the comparing the sizes between the N distortion values comprises: sequentially arranging the N candidate predictions according to the distortion values from small to large or from large to small. Sports information.

S1107. Determine the target predicted motion information from the N candidate predicted motion information.

The first identification information of the target prediction motion information and the target identification information are matched, and the candidate prediction motion information corresponding to the first identification information that matches the target identification information is determined as the target predicted motion information.

Generally, the first identification information of the target prediction motion information and the target identification information are matched, that is, the first identification information and the target identification information are equal.

Exemplarily, according to the exemplary implementation manner described in S1104, it may be assumed that the binarized character strings of the index values of the candidate prediction motion information obtained according to S1106 are: mode 2 “0001”, mode 3 “001”, mode 4 "00001", in combination with the identifier of the target predicted motion information parsed in S1102 is "001", it can be determined that the target predicted motion information is candidate predicted motion information corresponding to mode 3.

In some embodiments, the group information for determining the N candidate prediction motion information from the M candidate prediction motion information is represented by additional information. Illustratively, the encoding end transmits a group identifier to the decoding end to enable decoding. The end knows the group in which the target predicts the motion information. The target identification information of the target prediction motion information parsed in the S1102 may be used only to distinguish different candidate prediction motion information in the group, that is, intra-group index information, and the first identification information of the target prediction motion information and the target The identification information is matched with the target identification information of the combined group identifier and the target predicted motion information to search for candidate predicted motion information represented by the corresponding first identifier information.

When the to-be-processed block has a plurality of candidate prediction motion vectors, the similarity between the reference image blocks indicated by the candidate prediction motion vector of the to-be-processed block and the to-be-processed block is a priori knowledge to assist in determining each candidate prediction motion vector. The encoding of the identifier, thereby achieving the purpose of saving coding bits and improving coding efficiency.

An encoding method 1200 of an embodiment of the present invention will be specifically described below.

S1201, as shown in FIG. 8, for the to-be-processed block 250, the motion information of the 252A position, the motion information of the 252B position, the motion information of the 252C position, the motion information of the 252D position, and the motion information obtained by using the ATMVP mode for 250 blocks are sequentially detected. The motion information obtained by using the STMVP mode for 250 blocks, the motion information of the 252E block, and the motion information obtained by using the TMVP mode for 250 blocks. The detection content includes: (1) whether the motion information is available (available here is broadly available, including not only whether the image block corresponding to the motion information exists, but also according to properties of other coding tools, such as prediction mode, Whether the motion information can be used in the embodiment of the present application, etc.); (2) whether the motion information is overlapped with the previously detected motion information. The motion information obtained by the motion information and not repeated with the previously detected motion information is sequentially obtained until the number reaches five, and the five sequentially acquired motion information may be referred to as MV0, MV1, MV2, MV3, and MV4, respectively.

S1202, the size of the 250 block is 16x8, the adjacent reconstructed image block (referred to as the upper template) of the upper boundary of the 250 block is 16x1, and the adjacent reconstructed image block (referred to as the left template) of the left border is 1x8. It is detected whether the upper template and the left template of 250 blocks exist.

When neither the upper template nor the left template exists, the process ends, and the identifier of the predicted motion information of the to-be-processed block is encoded according to the prediction method of the Merge mode in the JEM reference software.

Otherwise, continue with this process when at least one template exists.

S1203, according to the protocol set in advance by the codec, divides MV0, MV1, and MV2 into the first group, and MV3 and MV4 are divided into the second group.

S1204: Obtain 250 reference picture blocks REF0, REF1, and REF2 according to MVO, MV1, and MV2, respectively.

S1205, taking MV0 as an example, respectively performing the following operations on MV0, MV1, and MV2 in the first group:

According to the template existing in 250 blocks, it is possible to set 250 left template TL and upper template TA to exist, and determine the left template TL0 and the upper template TA0 corresponding to REF0, TL0 and TL are equal in size, position corresponding, TA0 and TA are equal in size, position correspond.

The SAD value of the pixel values of TL0 and TL is calculated to obtain SAD01, and the SAD value of the pixel values of TA0 and TA is calculated to obtain SAD02, and SAD01 and SAD02 are added to obtain SAD0 as the distortion value corresponding to MV0.

Similarly, the distortion value SAD1 corresponding to the distortion values SAD1 and MV2 corresponding to MV1 is obtained.

S1206. Arrange SAD0, SAD1, and SAD2 from small to large. It may be assumed that the order from small to large is SAD2<SAD0<SAD1.

S1207: According to the magnitude relationship of the distortion value, assign a bin string to MV0, MV1, and MV2 respectively:

MV2 corresponds to "1", MV0 corresponds to "01", and MV1 corresponds to "001".

S1208 assigns a bin string to MV3 and MV4 respectively:

MV3 corresponds to "0001" and MV4 corresponds to "0000".

S1209: performing rate distortion calculation according to the bin string given to each motion information in the foregoing step, and selecting the rate distortion cost is minimum (the number of coded bits under the same reconstructed image distortion is small, or the reconstructed image distortion is smaller under the same coded bit) The information is used as the final motion prediction information selected for 250 blocks.

S1210: When the inter prediction mode is finally selected as the actual coding mode of 250 blocks, the bin string corresponding to the finally selected prediction motion information in S1209 is written into the code stream by entropy coding.

A decoding method 1300 of an embodiment of the present invention will be specifically described below. This embodiment corresponds to the encoding method 1200.

S1301, and the encoding end are consistent. As shown in FIG. 8 , for the to-be-processed block 250, the size of the 250 block is 16×8, and the adjacent reconstructed image block (referred to as the upper template) of the upper boundary of the 250 block is 16×1, and the left boundary is The adjacent reconstructed image block (referred to as the left template) is 1x8. It is detected whether the upper template and the left template of 250 blocks exist.

When neither the upper template nor the left template exists, the process ends, and the identifier of the predicted motion information of the to-be-processed block is decoded according to the prediction method of the Merge mode in the JEM reference software.

Otherwise, continue with this process when at least one template exists.

S1302: Analyze the code stream, and obtain a bin string corresponding to the predicted motion information identifier of 250 blocks, which may be set to “001”. And according to the protocol preset by the decoding end, consistent with the encoding end, the first group has three motion information, the second group has two motion information, and “001” represents the third motion information (index value is 2), so only It is necessary to determine the three sets of motion information of the first group without determining the motion information of the second group.

S1303, sequentially detecting motion information of the 252A position, motion information of the 252B position, motion information of the 252C position, motion information of the 252D position, motion information obtained by using the ATMVP mode for 250 blocks, motion information obtained by using the STMVP mode for 250 blocks, The motion information of the 252E block and the motion information obtained by using the TMVP mode for 250 blocks. The detection content includes: (1) whether the motion information is available (available here is broadly available, including not only whether the image block corresponding to the motion information exists, but also according to properties of other coding tools, such as prediction mode, Whether the motion information can be used in the embodiment of the present application, etc.); (2) whether the motion information is overlapped with the previously detected motion information. The motion information obtained by the motion information and not repeated with the previously detected motion information is sequentially acquired until the number reaches the three determined in S1302 (inferred according to the step S1302), and is consistent with the encoding end, and is respectively called three sequentially acquired. The motion information is MV0, MV1, MV2.

S1304: Obtain 250 reference picture blocks REF0, REF1, and REF2 according to MVO, MV1, and MV2, respectively.

S1305, taking MV0 as an example, respectively performing the following operations on MV0, MV1, and MV2:

According to the existing template of 250 blocks, the left template TL and the upper template TA are both present, and the left template TL0 and the upper template TA0 corresponding to REF0 are determined, and the TL0 and TL are equal in size and position corresponding, TA0 and TA. The size is equal and the position corresponds.

S1306, SAD0, SAD1, and SAD2 are arranged from small to large, and are consistent with the coding end. The order from small to large is SAD2<SAD0<SAD1.

S1307: According to the magnitude relationship of the distortion value, assign a bin string to MV0, MV1, and MV2 respectively:

MV2 corresponds to "1", MV0 corresponds to "01", and MV1 corresponds to "001".

S1308, comparing the bin string assigned to each motion information in the above step and the bin string "001" corresponding to the 250 motion prediction information identifiers parsed from the code stream, it can be seen that the bin string of the MV1 is also "001", and the MV1 is determined. It is predicted motion information for 250 blocks.

An encoding method 1400 of an embodiment of the present invention will be specifically described below.

S1401, as shown in FIG. 8, for the to-be-processed block 250, the motion information of the 252A position, the motion information of the 252B position, the motion information of the 252C position, the motion information of the 252D position, and the motion information obtained by using the ATMVP mode for 250 blocks are sequentially detected. The motion information obtained by using the STMVP mode for 250 blocks, the motion information of the 252E block, and the motion information obtained by using the TMVP mode for 250 blocks. The detection content includes: (1) whether the motion information is available (available here is broadly available, including not only whether the image block corresponding to the motion information exists, but also according to properties of other coding tools, such as prediction mode, Whether the motion information can be used in the embodiment of the present application, etc.); (2) whether the motion information is overlapped with the previously detected motion information. The motion information obtained by the motion information and not repeated with the previously detected motion information is sequentially obtained until the number reaches six, and the six sequentially acquired motion information may be referred to as MV0, MV1, MV2, MV3, MV4, MV5, respectively.

S1402, the size of the 250 block is 16x16, the adjacent reconstructed image block (abbreviated as the upper template) of the upper boundary of the 250 block is 32x1, and the adjacent reconstructed image block (referred to as the left template) of the left border is 1x32. It is detected whether the upper template and the left template of 250 blocks exist.

Otherwise, continue with this process when at least one template exists.

S1403, according to the protocol set in advance by the codec, divides MV0, MV1, and MV2 into the first group, and MV3, MV4, and MV5 are divided into the second group.

S1404: Obtain 250 reference picture blocks REF0, REF1, REF2, REF3, REF4, and REF5 according to MVO, MV1, MV2, MV3, MV4, and MV5, respectively.

S1405 takes MV0 as an example to perform the following operations on MV0, MV1, MV2, MV3, MV4, and MV5 respectively:

According to the template existing in 250 blocks, it is possible to set 250 blocks only for the upper template TA, and determine the upper template TA0 corresponding to REF0, TA0 and TA are equal in size and corresponding in position.

The SAD value of the pixel values of TA0 and TA is calculated to obtain SAD0 as the distortion value corresponding to MV0.

Similarly, the distortion values SAD1 corresponding to MV1, the distortion values SAD2 corresponding to MV2, the distortion values SAD3 corresponding to MV3, the distortion values SAD4 corresponding to MV4, and the distortion value SAD5 corresponding to MV5 are obtained.

S1406, SAD0, SAD1, SAD2 are arranged from small to large. It may be assumed that the order from small to large is SAD2<SAD0<SAD1.

S1407: According to the magnitude relationship of the distortion value, assign a bin string to MV0, MV1, and MV2 respectively:

MV2 corresponds to "1", MV0 corresponds to "01", and MV1 corresponds to "001".

S1408, SAD3, SAD4, and SADD5 are arranged from small to large. It may be assumed that the order from small to large is SAD5<SAD3<SAD4.

S1409, according to the magnitude relationship of the distortion value, assign a bin string to MV3, MV4, and MV5 respectively:

MV5 corresponds to "0001", MV3 corresponds to "00001", and MV4 corresponds to "00000".

S1410: performing rate distortion calculation according to the bin string assigned to each motion information in the foregoing step, and selecting the rate distortion cost is minimum (the number of coded bits under the same reconstructed image distortion is small, or the reconstructed image distortion is smaller under the same coded bit) The information is used as the final motion prediction information selected for 250 blocks.

S1411: When the inter prediction mode is finally selected as the actual coding mode of 250 blocks, the bin string corresponding to the finally selected prediction motion information in S1209 is written into the code stream by entropy coding.

A decoding method 1500 of an embodiment of the present invention will be specifically described below. This embodiment corresponds to the encoding method 1400.

S1501, and the encoding end are consistent. As shown in FIG. 8 , for the to-be-processed block 250, the size of the 250 block is 16×16, and the adjacent reconstructed image block (referred to as the upper template) of the upper boundary of the 250 block is 32×1, and the left boundary is The adjacent reconstructed image block (referred to as the left template) is 1x32. It is detected whether the upper template and the left template of 250 blocks exist.

Otherwise, continue with this process when at least one template exists.

S1502: Analyze the code stream, and obtain a bin string corresponding to the predicted motion information identifier of 250 blocks, which may be set to “0001”. And according to the protocol preset by the decoding end, consistent with the encoding end, the first group has three motion information, the second group has three motion information, and “0001” represents 4, so only three motion information of the second group is needed. The corresponding steps of steps S1408 and S1409 may be performed.

S1503, sequentially detecting motion information of the 252A position, motion information of the 252B position, motion information of the 252C position, motion information of the 252D position, motion information obtained by using the ATMVP mode for 250 blocks, motion information obtained by using the STMVP mode for 250 blocks, The motion information of the 252E block and the motion information obtained by using the TMVP mode for 250 blocks. The detection content includes: (1) whether the motion information is available (available here is broadly available, including not only whether the image block corresponding to the motion information exists, but also according to properties of other coding tools, such as prediction mode, Whether the motion information can be used in the embodiment of the present application, etc.); (2) whether the motion information is overlapped with the previously detected motion information. The motion information obtained by the motion information and not repeated with the previously detected motion information is sequentially obtained until the number reaches the six determined in S1302, and is consistent with the encoding end, and the six sequentially acquired motion information are respectively MV0, MV1, MV2, MV3, MV4, MV5.

S1506: Obtain 250 reference picture blocks REF3, REF4, and REF5 according to MV3, MV4, and MV5, respectively.

S1507, taking MV3 as an example, performs the following operations on MV3, MV4, and MV5 respectively:

According to the template existing in 250 blocks, it is consistent with the encoding end. It is assumed that only the upper template TA exists in 250 blocks, and the upper template TA3 corresponding to REF3 is determined, and the TA3 and TA are equal in size and corresponding in position.

The SAD value of the pixel values of TA3 and TA is calculated to obtain SAD3 as the distortion value corresponding to MV3.

Similarly, the distortion value SAD4 corresponding to the distortion value SAD4 and MV5 corresponding to the MV4 is obtained.

S1508, from small to large, arranges SAD3, SAD4, and SADD5, and is consistent with the coding end. The order from small to large is SAD5<SAD3<SAD4.

S1509 assigns a bin string to MV3, MV4, and MV5 according to the magnitude relationship of the distortion value:

S1510: Comparing the bin string assigned to each motion information in the above step and the bin string “0001” corresponding to the 250 motion prediction information identifiers parsed from the code stream, it can be seen that the bin string of the MV5 is also “0001”, and the MV5 is determined. It is predicted motion information for 250 blocks.

In some embodiments, the method of the embodiments of the present application may be used for the establishment of a candidate predicted motion information list of H.265 or H.266 standard Merge mode, AMVP mode or other inter prediction mode under development, and for actual use. Characterization of the encoded predictive motion information identification.

In some embodiments, the method of the embodiments of the present application may be used for the establishment of a candidate motion information (match block distance vector) list of intra prediction based on motion estimation, and a representation of the predicted motion information identifier for actual coding. .

In some embodiments, the method in this embodiment of the present application may be used for establishing a candidate prediction motion information (matching block distance vector) list in an SCC standard intra block copy mode, and a representation of the predicted motion information identifier for actual coding.

In some embodiments, the method of the embodiments of the present application may be used for establishing an inter-frame, intra-view, inter-view prediction candidate motion information list, and predictive motion information for actual coding. Characterization of the logo.

In some embodiments, the method in this embodiment of the present application may be used for establishing an inter-frame, intra-frame, inter-layer prediction candidate motion information list of a scalable coding standard, and for predicting motion information identification for actual coding. Characterization.

In the foregoing embodiments, correspondingly, adjacent reconstructed image blocks (templates in the foregoing specific embodiments) of the to-be-processed block used to characterize the similarity of the reference image blocks may be spatially adjacent reconstructed image blocks, The time domain adjacent reconstructed image block, the reconstructed image block of the adjacent view point, the reconstructed image block between the adjacent layers, and the scaled reconstructed image block and the like.

FIG. 17 is a schematic block diagram of an encoding apparatus 1700 according to an embodiment of the present application, including:

The obtaining module 1701 is configured to acquire N candidate predicted motion information of the image block to be processed, where N is an integer greater than one;

a calculation module 1702, configured to acquire a distortion value corresponding to each of the N candidate prediction motion information, where the distortion value is an adjacent reconstructed image block of the reference image block indicated by the candidate prediction motion information, and the to-be-processed image The adjacent reconstructed image block of the block is determined;

The comparing module 1703 is configured to determine, according to the size relationship between the acquired N distortion values, first identification information of each of the N candidate prediction motion information, the N candidate prediction motion information and a first one of each One-to-one correspondence of identification information;

The encoding module 1704 is configured to: when the target predicted motion information of the to-be-processed image block is one of the N candidate predicted motion information of the determined first identifier information, the first identifier of the target predicted motion information Information is programmed into the code stream.

In a possible implementation manner 17001, the adjacent reconstructed image block of the reference image block indicated by the candidate predicted motion information includes a reference adjacent reconstructed image block, and the adjacent reconstructed image block of the to-be-processed image block And including an original adjacent reconstructed image block corresponding to the reference adjacent reconstructed image block, the adjacent reconstructed image block of the reference image block whose distortion value is indicated by the candidate predicted motion information, and the image to be processed The adjacent reconstructed image block of the block is determined to include: the distortion value is represented by a difference characterization value of the reference neighboring reconstructed image block and the original neighboring reconstructed image block, the reference neighboring reconstruction The image block is identical in shape and equal in size to the original adjacent reconstructed image block, and a positional relationship between the reference adjacent reconstructed image block and the reference image block is opposite to the original adjacent reconstructed image block and The positional relationship between the image blocks to be processed is the same.

In a possible implementation manner 17002, the difference representative value of the reference adjacent reconstructed image block and the original adjacent reconstructed image block comprises: the reference adjacent reconstructed image block and the original phase An average absolute error of the adjacent reconstructed image block; an absolute error sum of the reference adjacent reconstructed image block and the original adjacent reconstructed image block; the reference adjacent reconstructed image block and the original adjacent weight a sum of squared errors of the image blocks; a squared sum of average errors of the reference neighboring reconstructed image blocks and the original neighboring reconstructed image blocks; the reference neighboring reconstructed image blocks and the original neighboring reconstructed An absolute Hadamard transform error sum of the image block; a normalized product correlation metric of the reference neighboring reconstructed image block and the original neighboring reconstructed image block; or the reference neighboring reconstructed image block And a similarity measure based on sequential similarity detection of the original adjacent reconstructed image block.

In a possible implementation manner 17003, the image block to be processed is a rectangle, the width of the image block to be processed is W, the height is H, and the original adjacent reconstructed image block is a rectangle, the original phase The lower boundary of the adjacent reconstructed image block is adjacent to the upper boundary of the image block to be processed, and the method includes: the original adjacent reconstructed image block has a width W and a height n; or the original phase The adjacent reconstructed image block has a width of W+H and a height of n; wherein W, H, and n are positive integers.

In a possible implementation manner 17004, the image block to be processed is a rectangle, the width of the image block to be processed is W, the height is H, and the original adjacent reconstructed image block is a rectangle, the original phase The right boundary of the adjacent reconstructed image block is adjacent to the left boundary of the image block to be processed, and the method includes: the original adjacent reconstructed image block has a width n and a height H; or the original phase The adjacent reconstructed image block has a width n and a height W+H; wherein W, H, and n are positive integers.

In one possible implementation 17005, n is 1 or 2.

In a possible implementation 17006, the encoding apparatus is used for inter-frame bidirectional prediction. Correspondingly, the reference image block indicated by the candidate prediction motion information includes a first reference image block and a second reference image block, corresponding to And the adjacent reconstructed image block of the reference image block indicated by the candidate predicted motion information includes a first reference adjacent reconstructed image block and a second reference adjacent reconstructed image block, correspondingly, the distortion value is Referring to the difference characterization values of the adjacent reconstructed image block and the original adjacent reconstructed image block, including: the distortion value is averaged by reference to the adjacent reconstructed image block and the original adjacent reconstructed image block Representing a difference characterization value, wherein the average reference neighboring reconstructed image block is obtained by calculating a pixel mean of the first reference neighboring reconstructed image block and the second reference neighboring reconstructed image block; or The distortion value is represented by an average of a first difference characterization value and a second difference characterization value, wherein the first difference characterization value is reconstructed by the first reference neighboring reconstructed image block and the original neighboring image Image block Characterization of the difference value is represented, characterized by said second difference values of the adjacent reconstructed image of the original block and the reconstructed difference value characterizing the adjacent image block represented by the second reference.

In a possible implementation 17007, the adjacent reconstructed image block of the reference image block indicated by the candidate predicted motion information includes a plurality of the reference adjacent reconstructed image blocks, and the plurality of the reference neighbors The reconstructed image block includes a third reference adjacent reconstructed image block and a fourth reference adjacent reconstructed image block, correspondingly, the adjacent reconstructed image block of the image block to be processed includes a plurality of the original adjacent weights Constructing an image block, the plurality of the original adjacent reconstructed image blocks comprising a third original adjacent reconstructed image block and a fourth original adjacent reconstructed image block, the distortion values being reconstructed by the reference neighbor And representing, by the difference characterization value of the image block and the original adjacent reconstructed image block, the distortion value is determined by the third reference neighboring reconstructed image block and the third original adjacent reconstructed image block And a difference characterization value and a sum of difference characterization values of the fourth reference neighboring reconstructed image block and the fourth original neighboring reconstructed image block.

In one possible implementation 17008, the distortion value is obtained according to the following calculation formula:

In a possible implementation 17009, the comparison module 1703 is specifically configured to: compare a size between the N distortion values; and assign a first identifier of each of the N candidate prediction motion information according to the comparison result. Information, wherein a length of a binary character string of the first identification information of the candidate prediction motion information having a smaller distortion value is less than or equal to a binary character of the first identification information used to encode the candidate prediction motion information having the larger distortion value The length of the string.

In a possible implementation 17010, the comparison module 1703 is specifically configured to sequentially arrange the N candidate predicted motion information according to the distortion values from small to large or from large to small.

In a possible implementation 17011, the acquiring module 1701 is specifically configured to: acquire, according to a preset sequence, N pieces of motion information of image blocks having a preset positional relationship with the image block to be processed that are different from each other. The N candidate prediction motion information.

In a possible implementation 17012, the acquiring module 1701 is specifically configured to: acquire, according to a preset sequence, motion information of the M different image blocks having a preset positional relationship with the image block to be processed. As the M candidate prediction motion information, where the M candidate prediction motion information includes the N candidate prediction motion information, M is an integer greater than N, and determining a grouping manner of the M candidate prediction motion information; In the grouping manner, the N candidate predicted motion information is determined from the M candidate predicted motion information.

In a possible implementation 17013, after the determining the grouping manner of the M candidate prediction motion information, the encoding module 1704 is further configured to: program the grouping manner into the code stream.

In a possible implementation 17014, after the obtaining the N candidate prediction motion information of the image block to be processed, the encoding module 1704 is further configured to: encode the N candidate prediction motion information into the code stream; Or, the second identification information indicating the N image blocks having the preset positional relationship with the image block to be processed is encoded into the code stream; or the N candidate prediction motion information is preset The third identification information of the correspondence is programmed into the code stream.

In a possible implementation 17015, the encoding apparatus 1700 further includes a detecting module 1705, before the acquiring the distortion values corresponding to the N candidate prediction motion information, the detecting module 1705 is further configured to: determine An adjacent reconstructed image block of the image block to be processed exists.

In a possible implementation 17016, the detecting module 1705 is specifically configured to: determine that at least one original adjacent reconstructed image block of the at least two of the original adjacent reconstructed image blocks exists.

In a possible implementation 17017, the encoding apparatus 1700 further includes a decision module 1706, before the acquiring the distortion values corresponding to the N candidate prediction motion information, the determining module 1706 is configured to: determine to perform And acquiring the distortion value corresponding to each of the N candidate predicted motion information.

In a possible implementation 17018, the decision module 1706 is specifically configured to: determine, according to the grouping manner, performing a distortion value corresponding to each of acquiring the N candidate prediction motion information.

In a possible implementation 17019, after the determining to perform the acquiring the respective distortion values corresponding to the N candidate prediction motion information, the encoding module 1704 is further configured to: encode the fourth identification information into the Describe a code stream, where the fourth identifier information is used to determine that performing the acquiring the distortion value corresponding to each of the N candidate predicted motion information.

In a possible implementation 17020, the obtaining module 1701 is further configured to: determine P candidate predicted motion information from the M candidate predicted motion information, where the P candidate predicted motion information and the The same candidate prediction motion information does not exist between the N candidate prediction motion information, P is a positive integer, and P is smaller than M-1.

For the specific implementation method and beneficial technical effects of each module, reference may be made to the coding method in the embodiment of the present application.

A detailed description of the corresponding steps of 1000 will not be repeated.

FIG. 18 is a schematic block diagram of a decoding apparatus 1800 according to an embodiment of the present application, including:

The parsing module 1801 is configured to parse the target identification information of the target predicted motion information of the to-be-processed image block from the code stream;

An obtaining module 1802, configured to determine N candidate predicted motion information, where the N candidate predicted motion information includes the target predicted motion information, where N is an integer greater than one;

a calculation module 1803, configured to acquire a distortion value corresponding to each of the N candidate prediction motion information, where the distortion value is an adjacent reconstructed image block of the reference image block indicated by the candidate prediction motion information, and the to-be-processed image The adjacent reconstructed image block of the block is determined;

a comparison module 1804, configured to determine, according to the size relationship between the acquired N distortion values, first identification information of each of the N candidate prediction motion information, the N candidate prediction motion information and respective first One-to-one correspondence of identification information;

The selecting module 1805 is configured to determine candidate predicted motion information corresponding to the first identifier information that matches the target identifier information as the target predicted motion information.

In a feasible implementation manner 1801, the adjacent reconstructed image block of the reference image block indicated by the candidate predicted motion information includes a reference adjacent reconstructed image block, and the adjacent reconstructed image block of the to-be-processed image block And including an original adjacent reconstructed image block corresponding to the reference adjacent reconstructed image block, the adjacent reconstructed image block of the reference image block whose distortion value is indicated by the candidate predicted motion information, and the image to be processed The adjacent reconstructed image block of the block is determined to include: the distortion value is represented by a difference characterization value of the reference neighboring reconstructed image block and the original neighboring reconstructed image block, the reference neighboring reconstruction The image block is identical in shape and equal in size to the original adjacent reconstructed image block, and a positional relationship between the reference adjacent reconstructed image block and the reference image block is opposite to the original adjacent reconstructed image block and The positional relationship between the image blocks to be processed is the same.

In a possible implementation manner 18002, the difference characterization values of the reference adjacent reconstructed image block and the original adjacent reconstructed image block comprise: the reference adjacent reconstructed image block and the original phase An average absolute error of the adjacent reconstructed image block; an absolute error sum of the reference adjacent reconstructed image block and the original adjacent reconstructed image block; the reference adjacent reconstructed image block and the original adjacent weight a sum of squared errors of the image blocks; a squared sum of average errors of the reference neighboring reconstructed image blocks and the original neighboring reconstructed image blocks; the reference neighboring reconstructed image blocks and the original neighboring reconstructed An absolute Hadamard transform error sum of the image block; a normalized product correlation metric of the reference neighboring reconstructed image block and the original neighboring reconstructed image block; or the reference neighboring reconstructed image block And a similarity measure based on sequential similarity detection of the original adjacent reconstructed image block.

In a possible implementation manner, the system block is a rectangle, the image block to be processed has a width W and a height H, and the original adjacent reconstructed image block is a rectangle. The lower boundary of the adjacent reconstructed image block is adjacent to the upper boundary of the image block to be processed, and includes: the original adjacent reconstructed image block has a width W and a height n; or the original adjacent reconstruction The image block has a width of W+H and a height of n; where W, H, and n are positive integers.

In a possible implementation manners 18004, the image block to be processed is a rectangle, the width of the image block to be processed is W, the height is H, and the original adjacent reconstructed image block is a rectangle, and the original phase The right boundary of the adjacent reconstructed image block is adjacent to the left boundary of the image block to be processed, and includes: the original adjacent reconstructed image block has a width n and a height H; or the original adjacent reconstruction The image block has a width n and a height W + H; where W, H, n are positive integers.

In one possible implementation, 18005, n is 1 or 2.

In a possible implementation 18006, the reference image block indicated by the candidate prediction motion information includes a first reference image block and a second reference image block, correspondingly, the phase of the reference image block indicated by the candidate prediction motion information. The adjacent reconstructed image block includes a first reference adjacent reconstructed image block and a second reference adjacent reconstructed image block, correspondingly, the distortion value is determined by the reference adjacent reconstructed image block and the original adjacent weight Representing the difference characterization value of the image block, including: the distortion value is represented by an average reference neighboring reconstructed image block and a difference characterization value of the original adjacent reconstructed image block, wherein the average reference neighbor Reconstructing an image block is obtained by calculating a pixel mean of the first reference neighboring reconstructed image block and the second reference neighboring reconstructed image block; or the distortion value is represented by a first difference representation value and a second difference Representing a mean value of the characterization value, wherein the first difference characterization value is represented by the difference characterization value of the first reference neighboring reconstructed image block and the original neighboring reconstructed image block, Two difference table Values of the original block and the reconstructed image by the difference characterizing the adjacent second reference value adjacent to a reconstructed image block represented.

In a possible implementation 18007, the adjacent reconstructed image block of the reference image block indicated by the candidate predicted motion information includes a plurality of the reference adjacent reconstructed image blocks, and the plurality of the reference adjacent neighbors The reconstructed image block includes a third reference adjacent reconstructed image block and a fourth reference adjacent reconstructed image block, correspondingly, the adjacent reconstructed image block of the image block to be processed includes a plurality of the original adjacent weights Constructing an image block, the plurality of the original adjacent reconstructed image blocks comprising a third original adjacent reconstructed image block and a fourth original adjacent reconstructed image block, the distortion values being reconstructed by the reference neighbor And representing, by the difference characterization value of the image block and the original adjacent reconstructed image block, the distortion value is determined by the third reference neighboring reconstructed image block and the third original adjacent reconstructed image block And a difference characterization value and a sum of difference characterization values of the fourth reference neighboring reconstructed image block and the fourth original neighboring reconstructed image block.

In one possible implementation 18008, the distortion value is obtained according to the following calculation formula:

In a possible implementation 18009, the comparing module 1804 is specifically configured to: compare a size between the N distortion values; and assign a first identifier of each of the N candidate prediction motion information according to the comparison result Information, wherein a length of a binary character string of the first identification information of the candidate prediction motion information having a smaller distortion value is less than a length of a binary character string of the first identification information of the candidate prediction motion information having the larger distortion value .

In a possible implementation 18010, the comparison module 1804 is specifically configured to sequentially arrange the N candidate predicted motion information according to the distortion values from small to large or from large to small.

In a possible implementation 18011, the acquiring module 1802 is specifically configured to: acquire, according to a preset sequence, N pieces of motion information of image blocks having a preset positional relationship with the image block to be processed that are different from each other. The N candidate prediction motion information.

In a possible implementation manner, the acquisition module 1802 is specifically configured to: acquire motion information of M image blocks having a preset positional relationship with the image block to be processed that are different from each other according to a preset sequence. As the M candidate prediction motion information, where the M candidate prediction motion information includes the N candidate prediction motion information, M is an integer greater than N, and determining a grouping manner of the M candidate prediction motion information; Determining the target identification information and the grouping manner, and determining the N candidate predicted motion information from the M candidate predicted motion information.

In a possible implementation manner, the obtaining module 1802 is specifically configured to: determine the preset grouping manner; or obtain the grouping manner by parsing from the code stream.

In a possible implementation manner, the acquisition module 1802 is specifically configured to: parse coding information of the plurality of candidate prediction motion information in the code stream to obtain the N candidate prediction motion information; or And parsing the second identifier information in the code stream to obtain the N candidate image blocks indicated by the second identifier information, and using the motion information of the N candidate image blocks as the N candidate motion information. Or parsing the third identification information in the code stream to obtain the N candidate prediction motion information that has a preset correspondence relationship with the third identification information.

In a possible implementation 18015, the apparatus 1800 further includes:

The detecting module 1806 is configured to determine that adjacent reconstructed image blocks of the image block to be processed are available.

In a possible implementation 18016, when the adjacent reconstructed image block of the image block to be processed includes at least two of the original adjacent reconstructed image blocks, the detecting module 1806 is specifically configured to: determine At least one of the at least two of the original adjacent reconstructed image blocks is available for use with the original adjacent reconstructed image block.

In a possible implementation 18017, the apparatus further includes:

The decision module 1807 is configured to determine to perform the acquiring the distortion value corresponding to each of the N candidate predicted motion information.

In a possible implementation 18018, the determining module 1807 is specifically configured to: determine, according to the grouping manner, performing the acquiring a distortion value corresponding to each of the N candidate prediction motion information; or parsing the code The fourth identification information in the stream is determined to perform the acquiring the distortion value corresponding to each of the N candidate predicted motion information.

For a detailed description of the corresponding steps of the decoding method 1100 in the embodiment of the present application, a detailed description of the corresponding steps of the decoding method 1100 in the embodiment of the present application is omitted.

FIG. 19 shows a schematic block diagram of an apparatus 1900 in accordance with an embodiment of the present application. The device includes:

a memory 1901, configured to store a program, where the program includes a code;

a transceiver 1902 for communicating with other devices;

The processor 1903 is configured to execute program code in the memory 1901.

Optionally, when the code is executed, the processor 1903 may implement various operations of the method 1000 or the method 1100, and details are not described herein. The transceiver 1902 is configured to perform specific signal transceiving under the driving of the processor 1903.

Although specific aspects of the present application have been described based on video encoder 20 and video decoder 30, it should be understood that the techniques of this disclosure may be through many other video encoding and/or decoding units, processors, processing units, such as encoder/decode The hardware-based decoding unit of the device and the like are applied. In addition, it should be understood that the steps shown and described in the various illustrative flowcharts of the present application are provided only as a possible implementation. That is, the steps shown in the possible embodiments of the schematic flowcharts of the present application are not necessarily performed in the order shown in the schematic flowcharts of the present application, and fewer, additional, or alternative steps may be performed.

In addition, it is to be understood that the specific actions or events of any of the methods described herein may be performed in different sequences depending on the possible embodiments, and may be added, combined, or omitted together (eg, not all described) The action or event is necessary for the practice method). Moreover, in certain possible implementations, the acts or events can be performed concurrently, rather than sequentially, via multi-threaded processing, interrupt processing, or multiple processors. In addition, while the specific aspects of the present application are described as being performed by a single module or unit for purposes of clarity, it is understood that the techniques of the present application can be implemented by a combination of units or modules associated with a video decoder.

In one or more possible implementations, the functions described can be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted as one or more instructions or code via a computer readable medium and executed by a hardware-based processing unit. The computer readable medium can comprise a computer readable storage medium or communication medium corresponding to a tangible medium such as a data storage medium, the communication medium comprising any medium that facilitates transfer of the computer program from one place to another in accordance with a communication protocol .

In this manner, computer readable media may illustratively correspond to (1) a non-transitory tangible computer readable storage medium, or (2) a communication medium such as a signal or carrier. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this application. The computer program product can comprise a computer readable medium.

As a possible implementation and not limitation, the computer readable storage medium may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage device, magnetic disk storage device or other magnetic storage device, flash memory or may be used to store instructions. Or any other medium in the form of a data structure and accessible by a computer. Also, any connection is properly termed a computer-readable medium. For example, if you use coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technology such as infrared, radio, and microwave to transmit commands from a website, server, or other remote source, then coaxial Cables, fiber optic cables, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of the media.

However, it should be understood that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but instead are directed to non-transitory tangible storage media. As used herein, magnetic disks and optical disks include compact discs (CDs), laser discs, optical discs, digital versatile discs (DVDs), flexible disks, and Blu-ray discs, in which disks typically reproduce data magnetically, while discs pass through thunder. The projection optically reproduces the data. Combinations of the above should also be included in the scope of computer readable media.

The instructions may be executed by one or more processors, such as one or more digital signal processors, general purpose microprocessors, application specific integrated circuits, field programmable gate arrays, or other equivalent integrated or discrete logic circuits. Accordingly, the term "processor," as used herein, may refer to any of the foregoing structures or any other structure suitable for implementing the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques can be fully implemented in one or more circuits or logic elements.

The techniques of the present application can be implemented in a wide variety of devices or devices, including wireless handsets, integrated circuits (ICs), or a collection of ICs (eg, a chipset). Various components, modules or units are described herein to emphasize functional aspects of the apparatus configured to perform the disclosed techniques, but do not necessarily need to be implemented by different hardware units. Rather, as described above, various units may be combined in a codec hardware unit or by interoperable hardware units (including one or more processors as described above) in conjunction with suitable software and/or firmware. The collection comes to offer.

The foregoing is only an exemplary embodiment of the present application, but the scope of protection of the present application is not limited thereto, and any person skilled in the art can easily think of changes or within the technical scope disclosed by the present application. Replacement should be covered by the scope of this application. Therefore, the scope of protection of the present application should be determined by the scope of protection of the claims.

Claims

A method for decoding image block prediction motion information, comprising:

Parsing target identification information of target predicted motion information of the image block to be processed from the code stream;

Determining N candidate predicted motion information, the N candidate predicted motion information including the target predicted motion information, where N is an integer greater than 1;

Obtaining a distortion value corresponding to each of the N candidate prediction motion information, where the distortion value is adjacent reconstructed image block of the reference image block indicated by the candidate prediction motion information and adjacent reconstruction of the to-be-processed image block Image block determination;

Determining, according to the magnitude relationship between the acquired N distortion values, first identification information of each of the N candidate prediction motion information, where the N candidate prediction motion information and the respective first identification information are in one-to-one correspondence;

The candidate predicted motion information corresponding to the first identification information that matches the target identification information is determined as the target predicted motion information.
The method according to claim 1, wherein the adjacent reconstructed image block of the reference image block indicated by the candidate predicted motion information comprises a reference adjacent reconstructed image block, and the adjacent weight of the image block to be processed The image block includes an original adjacent reconstructed image block corresponding to the reference neighboring reconstructed image block, the distortion value is an adjacent reconstructed image block of the reference image block indicated by the candidate predicted motion information, and the The adjacent reconstructed image block of the image block to be processed is determined, including:

The distortion value is represented by a difference characterization value of the reference neighboring reconstructed image block and the original neighboring reconstructed image block, the reference neighboring reconstructed image block and the original adjacent reconstructed image block The shapes are the same, the sizes are equal, and the positional relationship between the reference adjacent reconstructed image block and the reference image block is the same as the positional relationship between the original adjacent reconstructed image block and the image block to be processed .
The method according to claim 2, wherein the referencing the difference representation values of the adjacent reconstructed image block and the original adjacent reconstructed image block comprises:

And referring to an average absolute error (MAD) of the adjacent reconstructed image block and the original adjacent reconstructed image block;

And referring to an absolute error sum (SAD) of the adjacent reconstructed image block and the original adjacent reconstructed image block;

Calculating a sum of squared errors (SSD) of the adjacent reconstructed image block and the original adjacent reconstructed image block;

a mean squared sum of squares (MSD) of the reference neighboring reconstructed image block and the original adjacent reconstructed image block;

An absolute Hadamard transform error sum (SATD) of the reference neighboring reconstructed image block and the original adjacent reconstructed image block;

a normalized product correlation measure (NCC) of the reference adjacent reconstructed image block and the original adjacent reconstructed image block; or

Sequential similarity detection of the reference adjacent reconstructed image block and the original adjacent reconstructed image block

(SSDA) similarity measure.
The method according to claim 2 or 3, wherein the image block to be processed is a rectangle, the width of the image block to be processed is W, the height is H, and the original adjacent reconstructed image block is a rectangle. The lower boundary of the original adjacent reconstructed image block is adjacent to an upper boundary of the image block to be processed, and includes:

The original adjacent reconstructed image block has a width W and a height n; or

The original adjacent reconstructed image block has a width of W+H and a height of n; wherein W, H, and n are positive integers.
The method of claim 4 wherein n is 1 or 2.
The method according to claim 2 or 3, wherein the image block to be processed is a rectangle, the width of the image block to be processed is W, the height is H, and the original adjacent reconstructed image block is a rectangle. The right boundary of the original adjacent reconstructed image block is adjacent to a left boundary of the image block to be processed, and includes:

The original adjacent reconstructed image block has a width n and a height H; or

The original adjacent reconstructed image block has a width n and a height W+H; wherein W, H, n are positive integers.
The method of claim 6 wherein n is 1 or 2.
The method according to any one of claims 2 to 7, wherein the reference image block indicated by the candidate prediction motion information comprises a first reference image block and a second reference image block, correspondingly, the candidate prediction motion The adjacent reconstructed image block of the reference image block indicated by the information includes a first reference adjacent reconstructed image block and a second reference adjacent reconstructed image block, correspondingly, the distortion value is determined by the reference adjacent reconstructed image And a difference characterization value of the block and the original adjacent reconstructed image block, including:

The distortion value is represented by an average reference neighboring reconstructed image block and a difference representative value of the original adjacent reconstructed image block, wherein the average reference neighboring reconstructed image block is calculated by calculating the first reference phase And obtaining a pixel mean of the adjacent reconstructed image block and the second reference adjacent reconstructed image block; or

The distortion value is represented by an average of a first difference characterization value and a second difference characterization value, wherein the first difference characterization value is reconstructed by the first reference neighboring reconstructed image block and the original neighboring image The difference characterization value of the image block is represented by the difference characterization value of the second reference neighboring reconstructed image block and the original neighboring reconstructed image block.
The method according to any one of claims 2 to 8, wherein the adjacent reconstructed image block of the reference image block indicated by the candidate predicted motion information comprises a plurality of the reference adjacent reconstructed image blocks, The plurality of the reference adjacent reconstructed image blocks includes a third reference adjacent reconstructed image block and a fourth reference adjacent reconstructed image block, and correspondingly, the adjacent reconstructed image block of the to-be-processed image block includes a plurality of the original adjacent reconstructed image blocks, the plurality of the original adjacent reconstructed image blocks comprising a third original adjacent reconstructed image block and a fourth original adjacent reconstructed image block, the distortion value Represented by the difference characterization values of the reference neighboring reconstructed image block and the original neighboring reconstructed image block, including: the distortion value by the third reference neighboring reconstructed image block and the third And a difference characterization value of the original adjacent reconstructed image block and a sum of difference characterization values of the fourth reference adjacent reconstructed image block and the fourth original adjacent reconstructed image block.
The method according to claim 9, wherein the distortion value is obtained according to the following calculation formula:

Wherein, Distortion represents the distortion value, |Delta(Original i , Reference i )| represents the difference characterization value of the i-th original adjacent reconstructed image block and the ith reference adjacent reconstructed image block, and p represents The number of the original adjacent reconstructed image blocks used to calculate the distortion value.
The method according to any one of claims 1 to 10, wherein the determining the first identification information of each of the N candidate predicted motion information according to the magnitude relationship between the acquired N distortion values ,include:

Comparing the size between the N distortion values;

The first identification information of each of the N candidate prediction motion information is given according to the comparison result, wherein a length of a binary string of the first identification information of the candidate prediction motion information with a smaller distortion value is smaller than A length of a binary character string of the first identification information equal to the candidate prediction motion information having the larger distortion value.
The method according to claim 11, wherein said comparing a size between said N distortion values comprises:

The N candidate predicted motion information are sequentially arranged in order of the distortion values from small to large or from large to small.
The method according to any one of claims 1 to 12, wherein the determining the N candidate predicted motion information comprises:

Acquiring, in a preset order, N pieces of motion information of image blocks having a preset positional relationship with the image block to be processed as the N candidate predicted motion information.
The method according to any one of claims 1 to 12, wherein the determining the N candidate predicted motion information comprises:

Acquiring, in a preset order, M pieces of motion information of image blocks having a preset positional relationship with the image block to be processed as M candidate predicted motion information, wherein the M candidate predicted motion information includes The N candidate predicted motion information, where M is an integer greater than N;

Determining a grouping manner of the M candidate prediction motion information;

Determining the N candidate predicted motion information from the M candidate predicted motion information according to the target identification information and the grouping manner.
The method according to claim 14, wherein the determining a grouping manner of the M candidate prediction motion information comprises:

Determining the preset grouping manner; or,

The packet mode is obtained by parsing from the code stream.
The method according to any one of claims 1 to 12, wherein the determining the N candidate predicted motion information comprises:

Parsing coding information of the plurality of candidate prediction motion information in the code stream to obtain the N candidate prediction motion information; or

And parsing the second identifier information in the code stream to obtain the N candidate image blocks indicated by the second identifier information, and using the motion information of the N candidate image blocks as the N candidate prediction motion information; or,

And parsing the third identifier information in the code stream to obtain the N candidate predicted motion information that has a preset correspondence relationship with the third identifier information.
The method according to any one of claims 1 to 16, wherein before the obtaining the distortion value corresponding to each of the N candidate prediction motion information, the method further comprises:

It is determined that adjacent reconstructed image blocks of the image block to be processed are available.
The method according to claim 17, wherein said determining said image to be processed when said adjacent reconstructed image block of said image block comprises at least two of said original adjacent reconstructed image blocks The adjacent reconstructed image blocks of the block are available, including:

Determining that at least one of the at least two of the original adjacent reconstructed image blocks is available.
The method according to any one of claims 1 to 18, wherein after the determining the N candidate predicted motion information, the method further comprises:

Determining to perform the obtaining the distortion value corresponding to each of the N candidate predicted motion information.
The method according to claim 19, wherein the determining to perform the acquiring the distortion value corresponding to each of the N candidate prediction motion information comprises:

Determining, according to the grouping manner, performing the acquiring a distortion value corresponding to each of the N candidate prediction motion information; or parsing the fourth identifier information in the code stream to determine to perform the acquiring the N candidate predictions The distortion value corresponding to each of the motion information.
A decoding device for predicting motion information of an image block, comprising:

a parsing module, configured to parse target identification information of the target predicted motion information of the to-be-processed image block from the code stream;

And an obtaining module, configured to determine N candidate predicted motion information, where the N candidate predicted motion information includes the target predicted motion information, where N is an integer greater than one;

a calculation module, configured to acquire a distortion value corresponding to each of the N candidate prediction motion information, where the distortion value is an adjacent reconstructed image block of the reference image block indicated by the candidate prediction motion information, and the to-be-processed image block Adjacent reconstructed image block determination;

a comparison module, configured to determine, according to the size relationship between the acquired N distortion values, first identification information of each of the N candidate prediction motion information, the N candidate prediction motion information and respective first identifiers Information one-to-one correspondence;

And a selection module, configured to determine candidate predicted motion information corresponding to the first identifier information that matches the target identifier information as the target predicted motion information.
The apparatus according to claim 21, wherein the adjacent reconstructed image block of the reference image block indicated by the candidate predicted motion information comprises a reference adjacent reconstructed image block, and the adjacent weight of the image block to be processed The image block includes an original adjacent reconstructed image block corresponding to the reference neighboring reconstructed image block, the distortion value is an adjacent reconstructed image block of the reference image block indicated by the candidate predicted motion information, and the The adjacent reconstructed image block of the image block to be processed is determined, including:

The distortion value is represented by a difference characterization value of the reference neighboring reconstructed image block and the original neighboring reconstructed image block, the reference neighboring reconstructed image block and the original adjacent reconstructed image block The shapes are the same, the sizes are equal, and the positional relationship between the reference adjacent reconstructed image block and the reference image block is the same as the positional relationship between the original adjacent reconstructed image block and the image block to be processed .
The apparatus according to claim 22, wherein the difference representative value of the reference adjacent reconstructed image block and the original adjacent reconstructed image block comprises:

And referring to an average absolute error (MAD) of the adjacent reconstructed image block and the original adjacent reconstructed image block;

And referring to an absolute error sum (SAD) of the adjacent reconstructed image block and the original adjacent reconstructed image block;

Calculating a sum of squared errors (SSD) of the adjacent reconstructed image block and the original adjacent reconstructed image block;

a mean squared sum of squares (MSD) of the reference neighboring reconstructed image block and the original adjacent reconstructed image block;

An absolute Hadamard transform error sum (SATD) of the reference neighboring reconstructed image block and the original adjacent reconstructed image block;

a normalized product correlation measure (NCC) of the reference adjacent reconstructed image block and the original adjacent reconstructed image block; or

Sequential similarity detection of the reference adjacent reconstructed image block and the original adjacent reconstructed image block

(SSDA) similarity measure.
The device according to claim 22 or 23, wherein the image block to be processed is a rectangle, the width of the image block to be processed is W, the height is H, and the original adjacent reconstructed image block is a rectangle. The lower boundary of the original adjacent reconstructed image block is adjacent to an upper boundary of the image block to be processed, and includes:

The original adjacent reconstructed image block has a width W and a height n; or

The original adjacent reconstructed image block has a width of W+H and a height of n; wherein W, H, and n are positive integers.
The device according to claim 24, wherein n is 1 or 2.
The device according to claim 22 or 23, wherein the image block to be processed is a rectangle, the width of the image block to be processed is W, the height is H, and the original adjacent reconstructed image block is a rectangle. The right boundary of the original adjacent reconstructed image block is adjacent to a left boundary of the image block to be processed, and includes:

The original adjacent reconstructed image block has a width n and a height H; or

The original adjacent reconstructed image block has a width n and a height W+H; wherein W, H, n are positive integers.
The device according to claim 26, wherein n is 1 or 2.
The apparatus according to any one of claims 22 to 27, wherein the reference image block indicated by the candidate predicted motion information comprises a first reference image block and a second reference image block, correspondingly, the candidate predicted motion The adjacent reconstructed image block of the reference image block indicated by the information includes a first reference adjacent reconstructed image block and a second reference adjacent reconstructed image block, correspondingly, the distortion value is determined by the reference adjacent reconstructed image And a difference characterization value of the block and the original adjacent reconstructed image block, including:

The distortion value is represented by an average reference neighboring reconstructed image block and a difference representative value of the original adjacent reconstructed image block, wherein the average reference neighboring reconstructed image block is calculated by calculating the first reference phase And obtaining a pixel mean of the adjacent reconstructed image block and the second reference adjacent reconstructed image block; or

The distortion value is represented by an average of a first difference characterization value and a second difference characterization value, wherein the first difference characterization value is reconstructed by the first reference neighboring reconstructed image block and the original neighboring image The difference characterization value of the image block is represented by the difference characterization value of the second reference neighboring reconstructed image block and the original neighboring reconstructed image block.
The apparatus according to any one of claims 22 to 28, wherein the adjacent reconstructed image block of the reference image block indicated by the candidate predicted motion information comprises a plurality of the reference adjacent reconstructed image blocks, The plurality of the reference adjacent reconstructed image blocks includes a third reference adjacent reconstructed image block and a fourth reference adjacent reconstructed image block, and correspondingly, the adjacent reconstructed image block of the to-be-processed image block includes a plurality of the original adjacent reconstructed image blocks, the plurality of the original adjacent reconstructed image blocks comprising a third original adjacent reconstructed image block and a fourth original adjacent reconstructed image block, the distortion value Represented by the difference characterization values of the reference neighboring reconstructed image block and the original neighboring reconstructed image block, including: the distortion value by the third reference neighboring reconstructed image block and the third And a difference characterization value of the original adjacent reconstructed image block and a sum of difference characterization values of the fourth reference adjacent reconstructed image block and the fourth original adjacent reconstructed image block.
The apparatus according to claim 29, wherein said distortion value is obtained according to the following calculation formula:

Wherein, Distortion represents the distortion value, |Delta(Original i , Reference i )| represents the difference characterization value of the i-th original adjacent reconstructed image block and the ith reference adjacent reconstructed image block, and p represents The number of the original adjacent reconstructed image blocks used to calculate the distortion value.
The device according to any one of claims 21 to 30, wherein the comparison module is specifically configured to:

Comparing the size between the N distortion values;

The first identification information of each of the N candidate prediction motion information is given according to the comparison result, wherein a length of a binary string of the first identification information of the candidate prediction motion information with a small distortion value is smaller than A length of a binary character string of the first identification information equal to the candidate prediction motion information having the larger distortion value.
The device according to claim 31, wherein the comparison module is specifically configured to:

The N candidate predicted motion information are sequentially arranged in order of the distortion values from small to large or from large to small.
The device according to any one of claims 21 to 32, wherein the obtaining module is specifically configured to:

Acquiring, in a preset order, N pieces of motion information of image blocks having a preset positional relationship with the image block to be processed as the N candidate predicted motion information.
The device according to any one of claims 21 to 32, wherein the obtaining module is specifically configured to:

Acquiring, in a preset order, M pieces of motion information of image blocks having a preset positional relationship with the image block to be processed as M candidate predicted motion information, wherein the M candidate predicted motion information includes The N candidate predicted motion information, where M is an integer greater than N;

Determining a grouping manner of the M candidate prediction motion information;

Determining the N candidate predicted motion information from the M candidate predicted motion information according to the target identification information and the grouping manner.
The device according to claim 34, wherein the obtaining module is specifically configured to:

Determining the preset grouping manner; or,

The packet mode is obtained by parsing from the code stream.
The device according to any one of claims 21 to 32, wherein the obtaining module is specifically configured to:

Parsing coding information of the plurality of candidate prediction motion information in the code stream to obtain the N candidate prediction motion information; or

And parsing the second identifier information in the code stream to obtain the N candidate image blocks indicated by the second identifier information, and using the motion information of the N candidate image blocks as the N candidate prediction motion information; or,

And parsing the third identifier information in the code stream to obtain the N candidate predicted motion information that has a preset correspondence relationship with the third identifier information.
The device according to any one of claims 21 to 36, wherein the device further comprises:

And a detecting module, configured to determine that adjacent reconstructed image blocks of the image block to be processed are available.
The apparatus according to claim 37, wherein when the adjacent reconstructed image block of the image block to be processed includes at least two of the original adjacent reconstructed image blocks, the detecting module is specifically configured to:

Determining that at least one of the at least two of the original adjacent reconstructed image blocks is available.
The device according to any one of claims 34 to 38, wherein the device further comprises:

And a decision module, configured to determine, by performing the acquiring, the distortion value corresponding to each of the N candidate predicted motion information.
The device according to claim 39, wherein the decision module is specifically configured to:

Determining, according to the grouping manner, performing the acquiring a distortion value corresponding to each of the N candidate prediction motion information; or parsing the fourth identifier information in the code stream to determine to perform the acquiring the N candidate predictions The distortion value corresponding to each of the motion information.
A decoding device for predicting motion information of an image block, comprising:

a memory and a processor coupled to the memory;

The processor is configured to execute instructions to perform the method of any one of claims 1 to 20, the instructions being stored in the memory.