WO2020052653A1

WO2020052653A1 - Decoding method and device for predicted motion information

Info

Publication number: WO2020052653A1
Application number: PCT/CN2019/105711
Authority: WO
Inventors: 陈旭; 郑建铧
Original assignee: 华为技术有限公司
Priority date: 2018-09-13
Filing date: 2019-09-12
Publication date: 2020-03-19

Abstract

The embodiments of the present application relate to a decoding method and device for predicted motion information. Said method comprises: parsing a code stream to obtain a first identifier; according to the first identifier, determining from a first candidate set a target element, an element in the first candidate set comprising at least one piece of first candidate motion information and a plurality of pieces of second candidate motion information, the first candidate motion information including first motion information, and the second candidate motion information including a preset motion information offset; when the target element is the first candidate motion information, using the first candidate motion information as target motion information, the target motion information being used to predict motion information concerning an image block to be processed; and when the target element is obtained according to the plurality of pieces of second candidate motion information, parsing the code stream to obtain a second identifier, and according to the second identifier and on the basis of one of the plurality of pieces of second candidate motion information, determining the target motion information.

Description

Decoding method and device for predicting motion information

This application requires a Chinese patent application filed with the State Intellectual Property Office on September 13, 2018, with an application number of 201811068957.4 and an invention name of "a method and device for video encoding and decoding", and a national intellectual property right on October 26, 2018. Office, application number 201811264674.7, and the invention name is "a method and device for predicting motion information decoding" priority of the Chinese patent application, the entire contents of which are incorporated herein by reference.

Technical field

The present application relates to the technical field of video encoding and decoding, and in particular, to a method and device for predicting motion information decoding.

Background technique

Digital video technology can be widely used in various devices, including digital television, digital live broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), notebook computers, tablet computers, e-book readers, digital cameras, digital Recording devices, digital media players, video game devices, video game consoles, cellular or satellite radio telephones, video teleconferencing devices, video streaming devices, and the like. Digital video devices implement video decoding technology to effectively send, receive, encode, decode, and / or store digital video information.

In video decoding technology, video compression technology is particularly important. Video compression techniques perform spatial (intra-image) prediction and / or temporal (inter-image) prediction to reduce or remove redundant information inherent in a video sequence. The basic principle of video compression is to remove the redundancy as much as possible by using the correlation between the spatial, temporal and codewords. The current popular approach is to use a block-based hybrid video coding framework to achieve video coding compression through prediction (including intra prediction and inter prediction), transformation, quantization, and entropy coding.

Inter-prediction is to use the correlation of the video time domain to predict the pixels of the current image by using the pixels of adjacent coded images in order to effectively remove the video time domain redundancy. When performing inter prediction, the predicted motion information of each image block is determined from the candidate motion information list, thereby generating its prediction block through a motion compensation process. The motion information includes reference image information and motion vectors. The reference picture information includes unidirectional / bidirectional prediction information, a reference picture list, and a reference picture index corresponding to the reference picture list. Motion vectors refer to horizontal and vertical position shifts.

At present, there are many inter prediction methods, including Merge mode, Affine Merge mode, Advanced Motion Vector Prediction (AMVP) mode, and Affine AMVP mode. and many more.

In order to improve the accuracy of inter prediction, when more candidate situations are introduced, the length of the candidate motion information list becomes longer and longer, which is disadvantageous to the detection process and hardware implementation.

Summary of the Invention

The embodiments of the present application provide a decoding method and device for predicting motion information, which can effectively control the length of the candidate motion information list when more candidate motion information is introduced.

To achieve the above purpose, the embodiments of the present application adopt the following technical solutions:

A first aspect of the embodiments of the present application provides a method for decoding motion information prediction, including: parsing a code stream to obtain a first identifier; and determining a target element from a first candidate set according to the first identifier, the first candidate The elements in the set include at least one first candidate motion information and a plurality of second candidate motion information, the first candidate motion information includes first motion information, and the second candidate motion information includes a preset motion information offset; when the target element When it is the first candidate motion information, the first candidate motion information as the target element is used as the target motion information, and the target motion information is used to predict the motion information of the image block to be processed; when the target element is based on the plurality of second candidate motion information When obtained, the code stream is parsed to obtain a second identifier, and the target motion information is determined based on the second identifier based on one of the plurality of second candidate motion information.

Through the decoding method for predicting motion information provided in this application, the elements in the first candidate set include the first candidate motion information and a plurality of second candidate motion information. In this way, the structure of the multilayer candidate set, when more is introduced When a candidate is selected, a set of candidate motion information sets can be added as an element to the first candidate set. Compared with directly adding the candidate motion information to the first candidate set, the length of the first candidate set is greatly shortened. When the first candidate set is a candidate motion information list for inter prediction, even if more candidates are caused, the length of the candidate motion information list can be well controlled, which facilitates the detection process and hardware implementation.

In a feasible implementation manner of the first aspect, the first identifier may be a category identifier, which is used to indicate a category to which the target element belongs.

In a feasible implementation manner of the first aspect, the method for decoding prediction motion information provided in the embodiment of the present application may further include: parsing a bitstream to obtain a fourth identifier, where the fourth identifier is a target element in the first candidate set. , The index in the category indicated by the first identifier. In this implementation manner, the target element is uniquely determined by combining the fourth identifier with the first identifier.

In a feasible implementation manner of the first aspect, the first candidate motion information includes motion information of spatially adjacent image blocks of the image block to be processed.

In a feasible implementation manner of the first aspect, the first candidate motion information may be candidate motion information generated by a Merge mode.

In a feasible implementation manner of the first aspect, the second candidate motion information is obtained based on the first motion information and a preset motion information offset.

In a feasible implementation manner of the first aspect, determining the target motion information based on one of the plurality of second candidate motion information according to the second identifier includes: biasing the plurality of preset motion information according to the second identifier. The target offset is determined in the shift amount; the target motion information is determined based on the first motion information and the target offset.

In a feasible implementation manner of the first aspect, among the at least one first candidate motion information, an encoding codeword for identifying the first motion information is shortest.

In a feasible implementation manner of the first aspect, when the target element is obtained according to the plurality of second candidate motion information, the method for decoding prediction motion information provided in the present application may further include: parsing a bitstream to obtain a third An identifier, and the third identifier includes a preset coefficient.

In a feasible implementation manner of the first aspect, before determining the target motion information based on one of the plurality of second candidate motion information according to the second identifier, the method further includes: converting a plurality of preset motion information The offset is multiplied by a preset coefficient to obtain a plurality of adjusted motion information offsets.

In a feasible implementation manner of the first aspect, the target motion information is used to predict motion information of an image block to be processed, and includes: using the target motion information as motion information of the image block to be processed; or Process the predicted motion information of the image block. After obtaining the motion information or prediction motion information of the image block to be processed, motion compensation is performed to generate its image block or prediction block.

In a feasible implementation manner of the first aspect, the second identifier may adopt a fixed-length encoding method, which can save the number of bytes occupied by the identifier.

In a feasible implementation manner of the first aspect, the second identifier may adopt a variable-length encoding manner, so that more candidate motion information can be identified.

According to a second aspect of the embodiments of the present application, another decoding method for predicting motion information is provided, including: parsing a code stream to obtain a first identifier; and determining a target element from a first candidate set according to the first identifier, the first The elements in the candidate set include at least one first candidate motion information and at least one second candidate set. The elements in the second candidate set include multiple second candidate motion information. When the target element is the first candidate motion information, it will be regarded as The first candidate motion information of the target element is used as the target motion information, and the target motion information is used to predict the motion information of the image block to be processed. When the target element is the second candidate set, the code stream is parsed to obtain a second identifier. Two identifiers, determining the target motion information from the plurality of second candidate motion information.

Through the decoding method for predicting motion information provided in this application, the elements in the first candidate set include the first candidate motion information and at least one second candidate set. In this way, the structure of the multi-layer candidate set, when more candidates are introduced In this case, a type of candidate motion information set can be added as an element to the first candidate set. Compared with directly adding the candidate motion information to the first candidate set, the length of the first candidate set is greatly selected. When the first candidate set is a candidate motion information list for inter prediction, even if more candidates are caused, the length of the candidate motion information list can be well controlled, which facilitates the detection process and hardware implementation.

In a feasible implementation manner of the second aspect, the first identifier may be a category identifier, which is used to indicate a category to which the target element belongs.

In a feasible implementation manner of the second aspect, the method for decoding prediction motion information provided in the embodiment of the present application may further include: parsing a bitstream to obtain a fourth identifier, where the fourth identifier is a target element in the first candidate set. , The index in the category indicated by the first identifier. In this implementation manner, the target element is uniquely determined by combining the fourth identifier with the first identifier.

In a feasible implementation manner of the second aspect, the first candidate motion information includes motion information of spatially adjacent image blocks of the image block to be processed.

In a feasible implementation manner of the second aspect, the first candidate motion information may be candidate motion information generated by a Merge mode.

In a feasible implementation manner of the second aspect, the second candidate motion information includes motion information of a spatial domain non-adjacent image block of the image block to be processed.

In a feasible implementation manner of the second aspect, the second candidate motion information may be candidate motion information generated by the Affine Merge mode.

In a feasible implementation manner of the second aspect, the first candidate motion information includes first motion information, the second candidate motion information includes second motion information, and the second motion information is based on the first motion information and preset motion information. Offset is obtained.

In a feasible implementation manner of the second aspect, the first candidate motion information includes the first motion information, and the second candidate motion information includes a preset motion information offset; correspondingly, from a plurality of The determination of the target motion information in the second candidate motion information includes: determining a target offset from a plurality of preset motion information offsets according to a second identifier; and determining the target motion information based on the first motion information and the target offset.

In a feasible implementation manner of the second aspect, the first candidate motion information includes first motion information, and at least one second candidate set included in the first candidate set is a plurality of second candidate sets, and a plurality of second candidates The set includes at least one third candidate set and at least one fourth candidate set. Elements in the third candidate set include motion information of spatial non-adjacent image blocks of multiple image blocks to be processed, and elements in the fourth candidate set include multiple Motion information obtained based on the first motion information and a preset motion information offset.

In a feasible implementation manner of the second aspect, among the at least one first candidate motion information, a coding codeword for identifying the first motion information is shortest.

In a feasible implementation manner of the second aspect, the first motion information does not include motion information obtained according to an alternative temporal motion vector prediction (alternative temporal vector prediction) mode.

In a feasible implementation manner of the second aspect, at least one second candidate set included in the first candidate set is a plurality of second candidate sets, and the plurality of second candidate sets includes at least one fifth candidate set and at least one The sixth candidate set, the elements in the fifth candidate set include motion information of spatial non-adjacent image blocks of the plurality of image blocks to be processed, and the elements in the sixth candidate set include a plurality of preset motion information offsets.

In a feasible implementation manner of the second aspect, when the target element is the second candidate set, the decoding method for predicting motion information provided in this application may further include: parsing the code stream to obtain a third identifier, where the third identifier Including preset coefficients.

In a feasible implementation manner of the second aspect, before determining the target offset from a plurality of preset motion information offsets according to the second identifier, the method further includes: shifting the plurality of preset motion information. And the preset coefficients included in the third identifier are multiplied to obtain a plurality of adjusted motion information offsets; correspondingly, the target offset is determined from the plurality of preset motion information offsets according to the second identifier Includes: determining a target offset amount from a plurality of adjusted motion information offset amounts adjusted according to a preset coefficient according to a second identifier.

In a feasible implementation manner of the second aspect, the second candidate motion information and the first candidate motion information are different. Specifically, the first candidate motion information and the second candidate motion information may be candidate motion information selected according to different inter prediction modes.

In a feasible implementation manner of the second aspect, the target motion information is used to predict the motion information of the image block to be processed, and includes: using the target motion information as the motion information of the image block to be processed; or Process the predicted motion information of the image block. After obtaining the motion information or prediction motion information of the image block to be processed, motion compensation is performed to generate its image block or prediction block.

In a feasible implementation manner of the second aspect, the second identifier may adopt a fixed-length encoding method, which can save the number of bytes occupied by the identifier.

In a feasible implementation manner of the second aspect, the second identifier may adopt a variable-length encoding manner, so that more candidate motion information can be identified.

It should be noted that the specific implementation of the prediction motion information decoding methods provided in the first aspect and the second aspect can be referred to each other, and will not be described one by one.

According to a third aspect of the embodiments of the present application, a decoding apparatus for predicting motion information is provided, including: a parsing module for parsing a bitstream to obtain a first identifier; and a determining module for parsing a first candidate from the first candidate according to the first identifier. The target element is determined in the set. The elements in the first candidate set include at least one first candidate motion information and a plurality of second candidate motion information. The first candidate motion information includes the first motion information, and the second candidate motion information includes a preset. Offset of motion information; an assignment module, configured to use the first candidate motion information as the target motion information when the target element is the first candidate motion information, and the target motion information is used to predict the motion information of the image block to be processed; The module is further configured to: when the target element is obtained according to the plurality of second candidate motion information, parse the code stream to obtain a second identifier, and determine the target motion information based on one of the plurality of second candidate motion information according to the second identifier.

Through the decoding apparatus for predicting motion information provided in this application, the elements in the first candidate set include the first candidate motion information and a plurality of second candidate motion information. In this way, the structure of the multi-layer candidate set, when more is introduced When candidate is selected, a type of candidate motion information set can be added as an element to the first candidate set. Compared with directly adding the candidate motion information to the first candidate set, the length of the first candidate set is greatly selected. When the first candidate set is a candidate motion information list for inter prediction, even if more candidates are caused, the length of the candidate motion information list can be well controlled, which facilitates the detection process and hardware implementation.

In a feasible implementation manner of the third aspect, the first candidate motion information may include motion information of a spatially adjacent image block of the image block to be processed.

In a feasible implementation manner of the third aspect, the second candidate motion information is obtained based on the first motion information and a preset motion information offset.

In a feasible implementation manner of the third aspect, the analysis module is specifically configured to determine a target offset from a plurality of preset motion information offsets according to the second identifier; based on the first motion information and the target offset Determine the target motion information.

In a feasible implementation manner of the third aspect, among the at least one first candidate motion information, an encoding codeword for identifying the first motion information is shortest.

In a feasible implementation manner of the third aspect, when the target element is obtained according to multiple second candidate motion information, the parsing module is further configured to parse the code stream to obtain a third identifier, and the third identifier includes a preset coefficient.

In a feasible implementation manner of the third aspect, the device further includes a calculation module for multiplying a plurality of preset motion information offsets and a preset coefficient to obtain a plurality of adjusted motion information offsets. Shift amount.

In a feasible implementation manner of the third aspect, the determination module is specifically configured to determine a target offset from a plurality of adjusted motion information offsets obtained by the calculation module according to the second identifier, and then based on the first motion The information and target offset determine target motion information.

In a feasible implementation manner of the third aspect, the determining module is specifically configured to use the target motion information as the motion information of the image block to be processed; or use the target motion information as the predicted motion information of the image block to be processed.

In a feasible implementation manner of the third aspect, the second identifier adopts a fixed-length encoding manner.

In a feasible implementation manner of the third aspect, the second identifier adopts a variable length coding method.

It should be noted that the apparatus for decoding prediction motion information provided in the third aspect of the embodiments of the present application is configured to execute the method for decoding prediction motion information provided in the first aspect, and the specific implementation is the same, and details are not repeated one by one.

According to a fourth aspect of the embodiments of the present application, a decoding apparatus for predicting motion information is provided, including: a parsing module for parsing a bitstream to obtain a first identifier; and a determining module for parsing a first candidate from the first candidate according to the first identifier. The target element is determined in the set. The elements in the first candidate set include at least one first candidate motion information and at least one second candidate set. The elements in the second candidate set include a plurality of second candidate motion information; the assignment module, when the target When the element is the first candidate motion information, it is used to use the first candidate motion information as the target motion information, and the target motion information is used to predict the motion information of the image block to be processed; the analysis module is further configured to, when the target element is the second candidate During assembly, the code streams are parsed to obtain a second identifier, and the determining module is further configured to determine target motion information from the plurality of second candidate motion information according to the second identifier.

Through the decoding apparatus for predicting motion information provided in this application, the elements in the first candidate set include the first candidate motion information and at least one second candidate set. In this way, the structure of the multi-layer candidate set, when more candidates are introduced In this case, a type of candidate motion information set can be added as an element to the first candidate set. Compared with directly adding the candidate motion information to the first candidate set, the length of the first candidate set is greatly selected. When the first candidate set is a candidate motion information list for inter prediction, even if more candidates are caused, the length of the candidate motion information list can be well controlled, which facilitates the detection process and hardware implementation.

In a feasible implementation manner of the fourth aspect, the first candidate motion information may include motion information of spatially adjacent image blocks of the image block to be processed.

In a feasible implementation manner of the fourth aspect, the second candidate motion information may include motion information of a spatial domain non-adjacent image block of the image block to be processed.

In a feasible implementation manner of the fourth aspect, the first candidate motion information includes first motion information, the second candidate motion information includes second motion information, and the second motion information is based on the first motion information and preset motion information. Offset is obtained.

In a feasible implementation manner of the fourth aspect, the first candidate motion information includes first motion information, and the second candidate motion information includes a preset motion information offset; correspondingly, the analysis module is specifically configured to: The second identifier determines a target offset from a plurality of preset motion information offsets; and determines the target motion information based on the first motion information and the target offset.

In a feasible implementation manner of the fourth aspect, the first candidate motion information includes first motion information, at least one second candidate set is a plurality of second candidate sets, and the plurality of second candidate sets includes at least one third candidate Set and at least one fourth candidate set, the elements in the third candidate set include motion information of spatial non-adjacent image blocks of multiple image blocks to be processed, and the elements in the fourth candidate set include multiple based on the first motion information and Motion information obtained by a preset motion information offset.

In a feasible implementation manner of the fourth aspect, among the at least one first candidate motion information, an encoding codeword for identifying the first motion information is shortest.

In a feasible implementation manner of the fourth aspect, the first motion information does not include motion information obtained according to the ATMVP mode.

In a feasible implementation manner of the fourth aspect, the at least one second candidate set is a plurality of second candidate sets, and the plurality of second candidate sets includes at least one fifth candidate set and at least one sixth candidate set. The elements in the candidate set include motion information of spatial non-adjacent image blocks of multiple image blocks to be processed, and the elements in the sixth candidate set include multiple preset motion information offsets.

In a feasible implementation manner of the fourth aspect, when the target element is the second candidate set, the parsing module is further configured to parse the code stream to obtain a third identifier, and the third identifier includes a preset coefficient.

In a feasible implementation manner of the fourth aspect, it further includes a calculation module, configured to multiply a plurality of preset motion information offsets by a preset coefficient to obtain a plurality of adjusted motion information offsets. Correspondingly, the determination module is specifically configured to determine a target offset from a plurality of adjusted motion information offsets obtained from the calculation module according to the second identifier, and then determine the target motion based on the first motion information and the target offset. information.

In a feasible implementation manner of the fourth aspect, the second candidate motion information and the first candidate motion information are different.

In a feasible implementation manner of the fourth aspect, the determining module is specifically configured to use the target motion information as the motion information of the image block to be processed; or use the target motion information as the predicted motion information of the image block to be processed.

In a feasible implementation manner of the fourth aspect, the second identifier adopts a fixed-length encoding manner.

In a feasible implementation manner of the fourth aspect, the second identifier adopts a variable length coding method.

A fifth aspect of the embodiments of the present application provides a decoding apparatus for predicting motion information, including: a processor and a memory coupled to the processor; the processor is configured to execute the first aspect or the second aspect. Decoding method for predicting motion information.

According to a sixth aspect of the embodiments of the present application, a video decoder is provided, which includes a non-volatile storage medium and a central processing unit. The non-volatile storage medium stores executable programs, and the central processing unit and the The non-volatile storage medium is connected and executes the decoding method for predicting motion information according to the first aspect and / or the second aspect, or any one of the possible implementation manners.

According to a seventh aspect of the embodiments of the present application, a computer-readable storage medium is provided. The computer-readable storage medium stores instructions. When the instructions are run on a computer, the computer is caused to execute the first aspect or the first aspect. The decoding method for predicting motion information according to the second aspect.

According to an eighth aspect of the embodiments of the present application, a computer program product including instructions is provided, and when the instructions are run on a computer, the computer is caused to execute the method for decoding motion prediction information described in the first or second aspect. .

It should be understood that the third to eighth aspects of the present application are consistent with the technical solutions of the first aspect or the second aspect of this application, and the beneficial effects obtained by each aspect and the corresponding implementable design manner are similar and will not be described again.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is an exemplary block diagram of a video decoding system that can be configured for use in an embodiment of the present application; FIG.

FIG. 2 is an exemplary system block diagram of a video encoder that can be configured for use in an embodiment of the present application; FIG.

FIG. 3 is an exemplary system block diagram of a video decoder that can be configured for use in embodiments of the present application; FIG.

4 is a block diagram of an exemplary inter prediction module that can be configured for use in an embodiment of the present application;

5 is an exemplary implementation flowchart of a merge prediction mode;

6 is an exemplary implementation flowchart of an advanced motion vector prediction mode;

7 is an exemplary implementation flowchart of a motion compensation performed by a video decoder that can be configured for an embodiment of the present application;

FIG. 8 is a schematic diagram of an exemplary coding unit and adjacent position image blocks associated with the coding unit;

9 is an exemplary implementation flowchart of constructing a candidate prediction motion vector list;

10 is an exemplary implementation diagram of adding a combined candidate motion vector to a merge mode candidate prediction motion vector list;

11 is an exemplary implementation diagram of adding a scaled candidate motion vector to a merge mode candidate prediction motion vector list;

12 is an exemplary implementation diagram of adding a zero motion vector to a merge mode candidate prediction motion vector list;

FIG. 13 is a schematic diagram of another exemplary coding unit and adjacent position image blocks associated with the coding unit;

14A is a schematic diagram of an exemplary method for constructing a candidate motion vector set;

14B is a schematic diagram of an exemplary method for constructing a candidate motion vector set;

15 is a schematic flowchart of a decoding method for predicting motion information according to an embodiment of the present application;

16A is a schematic diagram of an exemplary method for constructing a candidate motion vector set;

16B is a schematic diagram of an exemplary method for constructing a candidate motion vector set;

16C is a schematic diagram of an exemplary method for constructing a candidate motion vector set;

FIG. 17 is a schematic block diagram of a decoding apparatus for predicting motion information according to an embodiment of the present application; FIG.

FIG. 18 is a schematic block diagram of a decoding apparatus for predicting motion information according to an embodiment of the present application.

detailed description

The terms "first", "second", "third", and "fourth" in the description and claims of the present application and the above-mentioned drawings are used to distinguish different objects, rather than to define a specific order.

In the embodiments of the present application, words such as "exemplary" or "for example" are used as examples, illustrations or illustrations. Any embodiment or design described as “exemplary” or “for example” in the embodiments of the present application should not be construed as more preferred or more advantageous than other embodiments or designs. Rather, the use of the words "exemplary" or "for example" is intended to present the relevant concept in a concrete manner.

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

FIG. 1 is a block diagram of a video decoding system 1 according to an example described in the embodiment of the present application. As used herein, the term "video coder" generally refers to both video encoders and video decoders. In this application, the terms "video coding" or "coding" may generally refer to video encoding or video decoding. The video encoder 100 and the video decoder 200 of the video decoding system 1 are configured to predict motion information, such as a motion vector, of a currently decoded image block or a sub-block thereof according to any one of multiple new inter prediction modes, The predicted motion vector is close to the motion vector obtained by using the motion estimation method to the greatest extent, so that it is not necessary to transmit the motion vector difference value during encoding, thereby further improving the encoding and decoding performance.

As shown in FIG. 1, the video decoding system 1 includes a source device 10 and a destination device 20. The source device 10 generates encoded video data. Therefore, the source device 10 may be referred to as a video encoding device. The destination device 20 may decode the encoded video data generated by the source device 10. Therefore, the destination device 20 may be referred to as a video decoding device. Various implementations of the source device 10, the destination device 20, or both may include one or more processors and a memory coupled to the one or more processors. The memory may include, but is not limited to, RAM, ROM, EEPROM, flash memory, or any other media that can be used to store the desired program code in the form of instructions or data structures accessible by a computer, as described herein.

The source device 10 and the destination device 20 may include various devices including desktop computers, mobile computing devices, notebook (e.g., laptop) computers, tablet computers, set-top boxes, telephone handsets, such as so-called "smart" phones, etc. Cameras, televisions, cameras, display devices, digital media players, video game consoles, on-board computers, or the like.

The destination device 20 may receive the encoded video data from the source device 10 via the link 30. The link 30 may include one or more media or devices capable of moving the encoded video data from the source device 10 to the destination device 20. In one example, the link 30 may include one or more communication media enabling the source device 10 to directly transmit the encoded video data to the destination device 20 in real time. In this example, the source device 10 may modulate the encoded video data according to a communication standard, such as a wireless communication protocol, and may transmit the modulated video data to the destination device 20. The one or more communication media may include wireless and / or wired communication media, such as a radio frequency (RF) spectrum or one or more physical transmission lines. The one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (eg, the Internet). The one or more communication media may include routers, switches, base stations, or other devices that facilitate communication from the source device 10 to the destination device 20.

In another example, the encoded data may be output from the output interface 140 to the storage device 40. Similarly, the encoded data can be accessed from the storage device 40 through the input interface 240. The storage device 40 may include any of a variety of distributed or locally accessed data storage media, such as a hard disk drive, a Blu-ray disc, a digital video disc (DVD), and a compact disc-read-only memory (CD-ROM), flash memory, volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded video data.

In another example, the storage device 40 may correspond to a file server or another intermediate storage device that may hold the encoded video produced by the source device 10. The destination device 20 may access the stored video data from the storage device 40 via streaming or download. The file server may be any type of server capable of storing encoded video data and transmitting the encoded video data to the destination device 20. Example file servers include a network server (for example, for a website), a file transfer protocol (FTP) server, a network attached storage (NAS) device, or a local disk drive. The destination device 20 can access the encoded video data through any standard data connection, including an Internet connection. This may include wireless channels (e.g., wireless-fidelity (Wi-Fi) connections), wired connections (e.g., digital subscriber lines (DSL), cable modems, etc.), or suitable for accessing storage A combination of both encoded video data on a file server. The transmission of the encoded video data from the storage device 40 may be a streaming transmission, a download transmission, or a combination of the two.

The decoding method for predicting motion information provided in the embodiments of the present application can be applied to video encoding and decoding to support a variety of multimedia applications, such as air television broadcasting, cable television transmission, satellite television transmission, streaming video transmission (for example, via the Internet), Encoding video data stored on a data storage medium, decoding video data stored on a data storage medium, or other applications. In some examples, the video coding system 1 may be used to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and / or video telephony.

The video decoding system 1 illustrated in FIG. 1 is merely an example, and the techniques of the present application can be applied to a video decoding setting (for example, video encoding or video decoding) that does not necessarily include any data communication between the encoding device and the decoding device. . In other examples, data is retrieved from local storage, streamed over a network, and so on. The video encoding device may encode the data and store the data to a memory, and / or the video decoding device may retrieve the data from the memory and decode the data. In many instances, encoding and decoding are performed by devices that do not communicate with each other, but only encode data to and / or retrieve data from memory and decode data.

In the example of FIG. 1, the source device 10 includes a video source 120, a video encoder 100, and an output interface 140. In some examples, the output interface 140 may include a regulator / demodulator (modem) and / or a transmitter. Video source 120 may include a video capture device (e.g., a camera), a video archive containing previously captured video data, a video feed interface to receive video data from a video content provider, and / or a computer for generating video data Graphics systems, or a combination of these sources of video data.

The video encoder 100 may encode video data from the video source 120. In some examples, the source device 10 transmits the encoded video data directly to the destination device 20 via the output interface 140. In other examples, the encoded video data may also be stored on the storage device 40 for later access by the destination device 20 for decoding and / or playback.

In the example of FIG. 1, the destination device 20 includes an input interface 240, a video decoder 200, and a display device 220. In some examples, the input interface 240 includes a receiver and / or a modem. The input interface 240 may receive the encoded video data via the link 30 and / or from the storage device 40. The display device 220 may be integrated with the destination device 20 or may be external to the destination device 20. Generally, the display device 220 displays decoded video data. The display device 220 may include various display devices, such as a liquid crystal display (LCD), a plasma display, an organic light-emitting diode (OLED) display, or other types of display devices.

Although not illustrated in FIG. 1, in some aspects, video encoder 100 and video decoder 200 may each be integrated with an audio encoder and decoder, and may include an appropriate multiplexer-demultiplexer unit Or other hardware and software to handle encoding of both audio and video in a common or separate data stream. In some examples, if applicable, the demultiplexer (MUX-DEMUX) unit may conform to the International Telecommunication Union (ITU) H.223 multiplexer protocol, or, for example, the user datagram protocol (user) datagram protocol, UDP) and other protocols.

Each of the video encoder 100 and the video decoder 200 may be implemented as any of a variety of circuits such as one or more microprocessors, digital signal processors (DSPs), and application specific integrated circuits. (application-specific integrated circuit (ASIC)), field programmable gate array (FPGA), discrete logic, hardware, or any combination thereof. If the present application is implemented partially in software, the device may store instructions for the software in a suitable non-volatile computer-readable storage medium and may use one or more processors to execute the instructions in hardware Thus implementing the technology of the present application. Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) may be considered as one or more processors. Each of video encoder 100 and video decoder 200 may be included in one or more encoders or decoders, any of which may be integrated as a combined encoder in a corresponding device / Decoder (codec).

This application may generally refer to video encoder 100 as "signaling" or "transmitting" certain information to another device, such as video decoder 200. The terms "signaling" or "transmitting" may generally refer to the transmission of syntax elements and / or other data to decode the compressed video data. This transfer can occur in real time or almost real time. Alternatively, this communication may occur over a period of time, such as when a syntax element is stored in a coded stream to a computer-readable storage medium at the time of encoding, and the decoding device may then store the syntax element after the syntax element is stored on this medium Retrieve the syntax element at any time.

JCT-VC has developed the H.265 (High Efficiency Video Coding (HEVC)) standard. The HEVC standardization is based on an evolution model of a video decoding device called a HEVC test model (HEVC model). The latest standard document of H.265 can be obtained from http://www.itu.int/rec/T-REC-H.265. The latest version of the standard document is H.265 (12/16). The standard document is in full text. The citation is incorporated herein. HM assumes that video decoding devices have several additional capabilities over existing algorithms of ITU-TH.264 / AVC. For example, H.264 provides 9 intra-prediction encoding modes, while HM provides up to 35 intra-prediction encoding modes.

JVET is committed to developing the H.266 standard. The process of H.266 standardization is based on the evolution model of the video decoding device called the H.266 test model. The algorithm description of H.266 can be obtained from http://phenix.int-evry.fr/jvet. The latest algorithm description is included in JVET-F1001-v2. The algorithm description document is incorporated herein by reference in its entirety. . At the same time, the reference software for the JEM test model is available from https://jvet.hhi.fraunhofer.de/svn/svn_HMJEMSoftware/ and is also incorporated herein by reference in its entirety.

Generally speaking, the working model description of HM can divide a video frame or image into a tree block or a sequence of the largest coding unit (LCU) that contains both luma and chroma samples. The LCU is also called the coding tree unit. (coding tree unit, CTU). The tree block has a similar purpose as the macro block of the H.264 standard. A slice contains several consecutive tree blocks in decoding order. A video frame or image can be split into one or more slices. Each tree block can be split into coding units according to a quadtree. For example, a tree block that is a root node of a quad tree may be split into four child nodes, and each child node may be a parent node and split into another four child nodes. The final indivisible child nodes that are leaf nodes of the quadtree include decoding nodes, such as decoded video blocks. The syntax data associated with the decoded codestream can define the maximum number of times a tree block can be split, and can also define the minimum size of a decoding node.

The coding unit includes a decoding node and a prediction unit (PU), and a transform unit (TU) associated with the decoding node. The size of the CU corresponds to the size of the decoding node and the shape must be square. The size of the CU can range from 8 × 8 pixels to a maximum 64 × 64 pixels or larger tree block size. Each CU may contain one or more PUs and one or more TUs. For example, the syntax data associated with a CU may describe a case where a CU is partitioned into one or more PUs. The partitioning mode may be different between cases where the CU is skipped or is encoded in direct mode, intra prediction mode, or inter prediction mode. The PU can be divided into non-square shapes. For example, the syntax data associated with a CU may also describe a case where a CU is partitioned into one or more TUs according to a quadtree. The shape of the TU can be square or non-square.

The HEVC standard allows transformation based on the TU, which can be different for different CUs. The TU is usually sized based on the size of the PUs within a given CU defined for the partitioned LCU, but this may not always be the case. The size of the TU is usually the same as or smaller than the PU. In some feasible implementations, a quad-tree structure called "residual quad-tree" (RQT) may be used to subdivide the residual samples corresponding to the CU into smaller units. The leaf node of RQT may be called TU. The pixel difference values associated with the TU may be transformed to produce a transformation coefficient, which may be quantized.

Generally speaking, the PU contains data related to the prediction process. For example, when a PU is intra-mode encoded, the PU may include data describing the intra-prediction mode of the PU. As another feasible implementation manner, when the PU is inter-mode encoded, the PU may include data defining a motion vector of the PU. For example, the data defining the motion vector of the PU may describe the horizontal component of the motion vector, the vertical component of the motion vector, the resolution of the motion vector (e.g., quarter-pixel accuracy or eighth-pixel accuracy), motion vector The reference image pointed to, and / or the reference image list of the motion vector (eg, list 0, list 1 or list C).

Generally, TU uses transform and quantization processes. A given CU with one or more PUs may also contain one or more TUs. After prediction, video encoder 100 may calculate a residual value corresponding to the PU. The residual values include pixel differences that can be transformed into transform coefficients, quantized, and scanned using TU to generate serialized transform coefficients for entropy decoding. This application generally uses the term "video block" to refer to the decoding node of a CU. In some specific applications, the term “video block” may also be used in this application to refer to a tree block including a decoding node and a PU and a TU, such as an LCU or a CU.

A video sequence usually contains a series of video frames or images. A group of pictures (GOP) exemplarily includes a series, one or more video pictures. The GOP may include syntax data in the header information of the GOP, the header information of one or more of the pictures, or elsewhere, and the syntax data describes the number of pictures included in the GOP. Each slice of the image may contain slice syntax data describing the coding mode of the corresponding image. Video encoder 100 typically operates on video blocks within individual video slices to encode video data. A video block may correspond to a decoding node within a CU. Video blocks may have fixed or varying sizes, and may differ in size according to a specified decoding standard.

As a feasible implementation, HM supports prediction of various PU sizes. Assuming the size of a specific CU is 2N × 2N, HM supports intra prediction of PU sizes of 2N × 2N or N × N, and symmetric PU sizes of 2N × 2N, 2N × N, N × 2N or N × N prediction. HM also supports asymmetric partitioning of PU-sized inter predictions of 2N × nU, 2N × nD, nL × 2N, and nR × 2N. In asymmetric partitioning, one direction of the CU is not partitioned, and the other direction is partitioned into 25% and 75%. The portion of the CU corresponding to the 25% section is indicated by an indication of "n" followed by "Up", "Down", "Left", or "Right". Therefore, for example, “2N × nU” refers to a horizontally divided 2N × 2NCU, where 2N × 0.5NPU is at the top and 2N × 1.5NPU is at the bottom.

In this application, “N × N” and “N times N” are used interchangeably to refer to the pixel size of a video block according to vertical and horizontal dimensions, for example, 16 × 16 pixels or 16 × 16 pixels. In general, a 16 × 16 block will have 16 pixels (y = 16) in the vertical direction and 16 pixels (x = 16) in the horizontal direction. Similarly, an N × N block has N pixels in the vertical direction and N pixels in the horizontal direction, where N represents a non-negative integer value. Pixels in a block can be arranged in rows and columns. In addition, the block does not necessarily need to have the same number of pixels in the horizontal direction as in the vertical direction. For example, a block may include N × M pixels, where M is not necessarily equal to N.

After the intra-predictive or inter-predictive decoding of the PU using the CU, the video encoder 100 may calculate the residual data of the TU of the CU. A PU may include pixel data in a spatial domain (also referred to as a pixel domain), and a TU may include transforming (e.g., discrete cosine transform (DCT), integer transform, wavelet transform, or conceptually similar transform) Coefficients in the transform domain after being applied to the residual video data. The residual data may correspond to a pixel difference between a pixel of an uncoded image and a prediction value corresponding to a PU. The video encoder 100 may form a TU including residual data of a CU, and then transform the TU to generate a transform coefficient of the CU.

After any transform to generate transform coefficients, video encoder 100 may perform quantization of the transform coefficients. Quantization exemplarily refers to the process of quantizing coefficients to possibly reduce the amount of data used to represent the coefficients to provide further compression. The quantization process may reduce the bit depth associated with some or all of the coefficients. For example, n-bit values may be rounded down to m-bit values during quantization, where n is greater than m.

The JEM model further improves the coding structure of video images. Specifically, a block coding structure called "Quad Tree Combined with Binary Tree" (QTBT) is introduced. The QTBT structure abandons the concepts of CU, PU, and TU in HEVC, and supports more flexible CU division shapes. A CU can be square or rectangular. A CTU first performs a quadtree partition, and the leaf nodes of the quadtree further perform a binary tree partition. At the same time, there are two partitioning modes in binary tree partitioning, symmetrical horizontal partitioning and symmetrical vertical partitioning. The leaf nodes of a binary tree are called CUs, and JEM's CUs cannot be further divided during the prediction and transformation process, that is, JEM's CU, PU, and TU have the same block size. In the current JEM, the maximum size of the CTU is 256 × 256 luminance pixels.

In some feasible implementations, the video encoder 100 may utilize a predefined scan order to scan the quantized transform coefficients to generate a serialized vector that can be entropy encoded. In other possible implementations, the video encoder 100 may perform adaptive scanning. After scanning the quantized transform coefficients to form a one-dimensional vector, the video encoder 100 may perform context-based adaptive variable-length decoding (CAVLC), context-adaptive binary arithmetic decoding (context-based based adaptive binary coding (CABAC), syntax-based adaptive binary binary arithmetic coding (SBAC), probability interval partitioning entropy (PIPE) decoding, or other entropy decoding methods Entropy decodes a one-dimensional vector. Video encoder 100 may also entropy encode syntax elements associated with the encoded video data for use by video decoder 200 to decode the video data.

To perform CABAC, video encoder 100 may assign a context within a context model to a symbol to be transmitted. Context can be related to whether adjacent values of a symbol are non-zero. To perform CAVLC, the video encoder 100 may select a variable length code of a symbol to be transmitted. Codewords in variable-length decoding (VLC) may be constructed such that relatively short codes correspond to more likely symbols, and longer codes correspond to less likely symbols. In this way, the use of VLC can achieve the goal of saving code rates relative to using equal length codewords for each symbol to be transmitted. The probability in CABAC can be determined based on the context assigned to the symbol.

In the embodiment of the present application, the video encoder may perform inter prediction to reduce temporal redundancy between images. As described above, a CU may have one or more prediction units PU according to the provisions of different video compression codec standards. In other words, multiple PUs may belong to a CU, or PUs and CUs are the same size. In this article, when the size of the CU and the PU are the same, the CU's partitioning mode is not divided, or it is divided into one PU, and the PU is uniformly used for expression. When the video encoder performs inter prediction, the video encoder may signal the video decoder motion information for the PU. Exemplarily, the motion information of the PU may include: a reference image index, a motion vector, and a prediction direction identifier. A motion vector may indicate a displacement between an image block (also called a video block, a pixel block, a pixel set, etc.) of a PU and a reference block of the PU. The reference block of the PU may be a part of the reference picture similar to the image block of the PU. The reference block may be located in a reference image indicated by a reference image index and a prediction direction identifier.

To reduce the number of coding bits required to represent the motion information of the PU, the video encoder may generate candidate prediction motion vectors (Motion Vector, MV) for each of the PUs according to the merge prediction mode or advanced motion vector prediction mode process List. Each candidate prediction motion vector in the candidate prediction motion vector list for the PU may indicate motion information, and the MV list may also be referred to as a candidate motion information list. The motion information indicated by some candidate prediction motion vectors in the candidate prediction motion vector list may be based on the motion information of other PUs. If the candidate prediction motion vector indicates motion information specifying one of a spatial candidate prediction motion vector position or a temporal candidate prediction motion vector position, the present application may refer to the candidate prediction motion vector as an "original" candidate prediction motion vector. For example, for a merge mode, also referred to herein as a merge prediction mode, there may be five original spatial candidate prediction motion vector positions and one original temporal candidate prediction motion vector position. In some examples, the video encoder may generate additional candidate prediction motion vectors by combining partial motion vectors from different original candidate prediction motion vectors, modifying the original candidate prediction motion vectors, or inserting only zero motion vectors as candidate prediction motion vectors. These additional candidate prediction motion vectors are not considered as original candidate prediction motion vectors and may be referred to as artificially generated candidate prediction motion vectors in this application.

The techniques of this application generally relate to a technique for generating a list of candidate prediction motion vectors at a video encoder and a technique for generating the same list of candidate prediction motion vectors at a video decoder. The video encoder and video decoder may generate the same candidate prediction motion vector list by implementing the same techniques used to construct the candidate prediction motion vector list. For example, both a video encoder and a video decoder may build a list with the same number of candidate prediction motion vectors (eg, five candidate prediction motion vectors). Video encoders and decoders may first consider spatial candidate prediction motion vectors (e.g., neighboring blocks in the same image), then consider temporal candidate prediction motion vectors (e.g., candidate prediction motion vectors in different images), and finally consider The artificially generated candidate prediction motion vectors are added until a desired number of candidate prediction motion vectors are added to the list. According to the technology of the present application, during the construction of the candidate prediction motion vector list, a type of candidate prediction motion vector may be indicated in the candidate prediction motion vector list through an identification bit to control the length of the candidate prediction motion vector list. For example, the spatial candidate prediction motion vector set and the temporal candidate prediction motion vector can be used as the original candidate prediction motion vector. When an artificially generated candidate prediction motion vector is added to the list of candidate prediction motion vectors, the An identification bit space is added to the vector list to indicate an artificially generated candidate prediction motion vector set. During codec, when an identification bit is selected, a prediction motion vector is selected from a set of candidate prediction motion vectors indicated by the identification bit.

After generating the candidate prediction motion vector list for the PU of the CU, the video encoder may select the candidate prediction motion vector from the candidate prediction motion vector list and output the candidate prediction motion vector index in the code stream. The selected candidate prediction motion vector may be a candidate prediction motion vector having a motion vector that most closely matches the predictor of the target PU being decoded. The candidate prediction motion vector index may indicate a position where a candidate prediction motion vector is selected in the candidate prediction motion vector list. The video encoder may also generate a predictive image block for the PU based on a reference block indicated by the motion information of the PU. The motion information of the PU may be determined based on the motion information indicated by the selected candidate prediction motion vector. For example, in the merge mode, the motion information of the PU may be the same as the motion information indicated by the selected candidate prediction motion vector. In the AMVP mode, the motion information of the PU may be determined based on the motion vector difference of the PU and the motion information indicated by the selected candidate prediction motion vector. The video encoder may generate one or more residual image blocks for the CU based on the predictive image blocks of the PU of the CU and the original image blocks for the CU. The video encoder may then encode one or more residual image blocks and output one or more residual image blocks in a code stream.

The bitstream may include data identifying a selected candidate prediction motion vector in the candidate prediction motion vector list of the PU, which is referred to herein as an identifier or signal. The data may include an index in the candidate prediction motion vector list, and the target motion vector is determined through the index; or the target motion vector is determined to be a certain type of candidate prediction motion vector through the index. At this time, the data further includes an instruction selection Information on the specific position of the data of the candidate prediction motion vector in this type of candidate prediction motion vector. The video decoder can parse the bitstream to obtain data of the selected candidate prediction motion vector in the candidate prediction motion vector list identifying the PU, and determine the data of the selected candidate prediction motion vector based on the data. Based on the candidate prediction motion vector list of the PU The motion information indicated by the selected candidate prediction motion vector in determines the motion information of the PU. The video decoder may identify one or more reference blocks for the PU based on the motion information of the PU. After identifying one or more reference blocks of the PU, the video decoder may generate predictive image blocks for the PU based on the one or more reference blocks of the PU. The video decoder may reconstruct an image block for a CU based on a predictive image block for a PU of the CU and one or more residual image blocks for the CU.

For ease of explanation, the present application may describe a position or an image block as having various spatial relationships with a CU or a PU. This description can be interpreted to mean that the position or image block and the image block associated with the CU or PU have various spatial relationships. In addition, in this application, a PU currently being decoded by a video decoder may be referred to as a current PU, and may also be referred to as a current image block to be processed. This application may refer to the CU that the video decoder is currently decoding as the current CU. This application may refer to the image currently being decoded by the video decoder as the current image. It should be understood that this application is applicable to a case where the PU and the CU have the same size, or the PU is the CU, and the PU is used to represent the same.

As briefly described previously, video encoder 100 may use inter prediction to generate predictive image blocks and motion information for a PU of a CU. In many examples, the motion information of a given PU may be the same or similar to the motion information of one or more nearby PUs (ie, PUs whose image blocks are spatially or temporally near the image blocks of the given PU). Because nearby PUs often have similar motion information, video encoder 100 may refer to the motion information of nearby PUs to encode motion information for a given PU. Encoding the motion information of a given PU with reference to the motion information of a nearby PU can reduce the number of encoded bits required to indicate the motion information of a given PU in the code stream.

Video encoder 100 may refer to motion information of nearby PUs in various ways to encode motion information for a given PU. For example, video encoder 100 may indicate that the motion information of a given PU is the same as the motion information of nearby PUs. This application may use a merge mode to refer to indicating that the motion information of a given PU is the same as that of nearby PUs or may be derived from the motion information of nearby PUs. In another feasible implementation, the video encoder 100 may calculate a Motion Vector Difference (MVD) for a given PU. MVD indicates the difference between the motion vector of a given PU and the motion vector of a nearby PU. Video encoder 100 may include MVD instead of a motion vector of a given PU in the motion information of a given PU. Representing MVD in the codestream requires fewer coding bits than representing the motion vector of a given PU. This application may use advanced motion vector prediction mode to refer to the motion information of a given PU by using the MVD and an index value identifying a candidate motion vector.

In order to use the merge mode or the AMVP mode to signal the motion information of a given PU on the decoding side, the video encoder 100 may generate a list of candidate predicted motion vectors for a given PU. The candidate prediction motion vector list may include one or more candidate prediction motion vectors. Each of the candidate prediction motion vectors in the candidate prediction motion vector list for a given PU may specify motion information. The motion information indicated by each candidate prediction motion vector may include a motion vector, a reference image index, and a prediction direction identifier. The candidate prediction motion vectors in the candidate prediction motion vector list may include "raw" candidate prediction motion vectors, each of which indicates motion information that is different from one of the specified candidate prediction motion vector positions within a PU of a given PU.

After generating the candidate prediction motion vector list for the PU, the video encoder 100 may select one of the candidate prediction motion vectors from the candidate prediction motion vector list for the PU. For example, a video encoder may compare each candidate prediction motion vector with the PU being decoded and may select a candidate prediction motion vector with a desired code rate-distortion cost. Video encoder 100 may output a candidate prediction motion vector index for a PU. The candidate prediction motion vector index may identify the position of the selected candidate prediction motion vector in the candidate prediction motion vector list.

In addition, the video encoder 100 may generate a predictive image block for a PU based on a reference block indicated by motion information of the PU. The motion information of the PU may be determined based on the motion information indicated by the selected candidate prediction motion vector in the candidate prediction motion vector list for the PU. For example, in the merge mode, the motion information of the PU may be the same as the motion information indicated by the selected candidate prediction motion vector. In the AMVP mode, motion information of a PU may be determined based on a motion vector difference for the PU and motion information indicated by a selected candidate prediction motion vector. Video encoder 100 may process predictive image blocks for a PU as described previously.

As mentioned above, an identifier bit may be used in the candidate prediction motion vector list to indicate a type of candidate prediction motion vector to control the length of the candidate prediction motion vector list. I will not repeat them here.

When video decoder 200 receives a code stream, video decoder 200 may generate a list of candidate predicted motion vectors for each of the PUs of the CU. The candidate prediction motion vector list generated by the video decoder 200 for the PU may be the same as the candidate prediction motion vector list generated by the video encoder 100 for the PU. The syntax element parsed by the video decoder 200 from the bitstream may indicate the position of the candidate prediction motion vector selected in the candidate prediction motion vector list of the PU. After generating a list of candidate prediction motion vectors for the PU, the video decoder 200 may generate predictive image blocks for the PU based on one or more reference blocks indicated by the motion information of the PU. The video decoder 200 may determine the motion information of the PU from the motion information indicated by the selected candidate prediction motion vector in the candidate prediction motion vector list for the PU based on the syntax element obtained by parsing the bitstream. Video decoder 200 may reconstruct an image block for a CU based on a predictive image block for a PU and a residual image block for a CU.

As mentioned above, the candidate prediction motion vector list may use a flag bit to indicate a type of candidate prediction motion vector. In this case, after receiving the code stream, the video decoder 200 first parses the code stream to obtain a first identifier, and the first identifier indicates The position of the candidate prediction motion vector selected in the candidate prediction motion vector list of the PU. The candidate prediction motion vector list of the PU includes at least one first candidate motion vector and at least one second candidate set, and the second candidate set includes at least one second candidate motion vector. Video decoder 200 determines a target element corresponding to the first identifier from a list of candidate predicted motion vectors of the PU according to the first identifier. If the target element is the first candidate motion vector, the video decoder 200 determines the target element as the target motion vector of the PU, and uses the target motion information to predict the motion information of the image block (PU) to be processed for subsequent decoding processes. If the target element is the second candidate set, the video decoder 200 parses the bitstream to obtain a second identifier, and the second identifier is used to identify a selected candidate prediction motion vector in the second candidate set indicated by the first identifier. Position; video decoder 200 determines target motion information from a plurality of second candidate motion vectors in a second candidate set indicated by the first ID according to the second identifier, and uses the target motion information to predict a to-be-processed image block (PU) The motion information is subsequently decoded.

As mentioned above, the candidate prediction motion vector list may use a flag bit to indicate a type of candidate prediction motion vector. In this case, after receiving the code stream, the video decoder 200 first analyzes the code stream to obtain a first identifier, and the first identifier indicates The position of the candidate prediction motion vector selected in the candidate prediction motion vector list of the PU. The PU candidate motion vector list includes at least one first candidate motion vector and multiple second candidate motion information. The first candidate motion information includes first motion information, and the second candidate motion information includes preset motion information. Shift amount. Video decoder 200 determines a target element corresponding to the first identifier from a list of candidate predicted motion vectors of the PU according to the first identifier. If the target element is the first candidate motion vector, the video decoder 200 determines the target element as the target motion vector of the PU, and uses the target motion information to predict the motion information of the image block (PU) to be processed for subsequent decoding processes. If the target element is obtained according to a plurality of second candidate motion information, the video decoder 200 parses the bitstream to obtain a second identifier, and determines the target motion based on one of the plurality of second candidate motion information according to the second identifier. Information, the target motion information is used to predict the motion information of the image block (PU) to be processed for subsequent decoding processes.

It should be noted that the candidate motion vectors in the candidate prediction motion vector list may be obtained according to different modes, which are not specifically limited in this application.

It should be understood that, in a feasible implementation manner, at the decoding end, the construction of the candidate prediction motion vector list and the parsing of the selected candidate prediction motion vector from the code stream in the candidate prediction motion vector list are independent of each other, and can be arbitrarily Sequentially or in parallel.

In another feasible implementation manner, at the decoding end, the position of the selected candidate prediction motion vector in the candidate prediction motion vector list is first parsed from the code stream, and a candidate prediction motion vector list is constructed based on the parsed position. In the embodiment, it is not necessary to construct all candidate prediction motion vector lists, but only the candidate prediction motion vector list at the parsed position, that is, the candidate prediction motion vector of the position can be determined. For example, when the selected candidate predictive motion vector is obtained by parsing the bitstream and is a candidate predictive motion vector with an index of 3 in the candidate predictive motion vector list, only the candidate predictive motion vector from index 0 to index 3 needs to be constructed The list can determine the candidate predicted motion vector with the index of 3, which can achieve the technical effect of reducing complexity and improving decoding efficiency.

FIG. 2 is a block diagram of a video encoder 100 according to an example described in the embodiment of the present application. The video encoder 100 is configured to output a video to the post-processing entity 41. The post-processing entity 41 represents an example of a video entity that can process the encoded video data from the video encoder 100, such as a media-aware network element (MANE) or a stitching / editing device. In some cases, the post-processing entity 41 may be an instance of a network entity. In some video encoding systems, the post-processing entity 41 and the video encoder 100 may be parts of separate devices, while in other cases, the functionality described with respect to the post-processing entity 41 may be performed by the same device including the video encoder 100 carried out. In a certain example, the post-processing entity 41 is an example of the storage device 40 of FIG. 1.

In the example of FIG. 2, the video encoder 100 includes a prediction processing unit 108, a filter unit 106, a decoded picture buffer (DPB) 107, a summer 112, a transformer 101, a quantizer 102, and entropy. Encoder 103. The prediction processing unit 108 includes an inter predictor 110 and an intra predictor 109. For image block reconstruction, the video encoder 100 further includes an inverse quantizer 104, an inverse transformer 105, and a summer 111. The filter unit 106 is intended to represent one or more loop filters, such as a deblocking filter, an adaptive loop filter (ALF), and a sample adaptive offset (SAO) filter. Although the filter unit 106 is shown as an in-loop filter in FIG. 2, in other implementations, the filter unit 106 may be implemented as a post-loop filter. In one example, the video encoder 100 may further include a video data memory and a segmentation unit (not shown in the figure).

The video data memory may store video data to be encoded by the components of the video encoder 100. The video data stored in the video data storage may be obtained from the video source 120. The DPB 107 may be a reference image memory that stores reference video data used by the video encoder 100 to encode video data in an intra-frame or inter-frame decoding mode. The video data memory and the DPB 107 can be formed by any of a variety of memory devices, such as a dynamic random access memory (SDRAM) including a synchronous dynamic random access memory (SDRAM), a magnetoresistance RAM (magnetic random access memory, MRAM), resistive RAM (resistive random access memory, RRAM), or other types of memory devices. Video data storage and DPB 107 can be provided by the same storage device or separate storage devices. In various examples, the video data memory may be on-chip with other components of video encoder 100 or off-chip relative to those components.

As shown in FIG. 2, the video encoder 100 receives video data and stores the video data in a video data memory. The segmentation unit divides the video data into several image blocks, and these image blocks can be further divided into smaller blocks, such as image block segmentation based on a quad tree structure or a binary tree structure. This segmentation may also include segmentation into slices, tiles, or other larger units. Video encoder 100 typically illustrates components that encode image blocks within a video slice to be encoded. The slice can be divided into multiple image patches (and possibly into a collection of image patches called slices). The prediction processing unit 108 may select one of a plurality of possible coding modes for the current image block, such as one of a plurality of intra coding modes or one of a plurality of inter coding modes. The prediction processing unit 108 may provide the obtained intra, inter-coded block to the summer 112 to generate a residual block, and to the summer 111 to reconstruct an encoded block used as a reference image.

The intra predictor 109 within the prediction processing unit 108 may perform intra predictive encoding of the current image block with respect to one or more neighboring blocks in the same frame or slice as the current block to be encoded to remove spatial redundancy. . The inter predictor 110 within the prediction processing unit 108 may perform inter predictive coding of the current image block with respect to one or more prediction blocks in the one or more reference images to remove temporal redundancy.

Specifically, the inter predictor 110 may be configured to determine an inter prediction mode for encoding a current image block. For example, the inter predictor 110 may use a rate-distortion analysis to calculate the rate-distortion values of various inter-prediction modes in the set of candidate inter-prediction modes, and select from them the best rate-distortion characteristics. Inter prediction mode. Code rate distortion analysis generally determines the amount of distortion (or error) between the coded block and the original uncoded block that was coded to produce the coded block, and the bit rate (also That is, the number of bits). For example, the inter predictor 110 may determine that the inter prediction mode with the lowest code rate distortion cost of encoding the current image block in the candidate inter prediction mode set is the inter prediction mode used for inter prediction of the current image block.

The inter predictor 110 is configured to predict motion information (such as a motion vector) of one or more sub-blocks in the current image block based on the determined inter prediction mode, and use the motion information (such as the motion vector) of one or more sub-blocks in the current image block. Motion vector) to obtain or generate a prediction block of the current image block. The inter predictor 110 may locate a prediction block pointed to by the motion vector in one of the reference image lists. The inter predictor 110 may also generate syntax elements associated with image blocks and video slices for use by the video decoder 200 when decoding image blocks of the video slice. In another example, the inter predictor 110 uses the motion information of each sub-block to perform a motion compensation process to generate a prediction block of each sub-block, thereby obtaining a prediction block of the current image block. It should be understood that the The inter predictor 110 performs motion estimation and motion compensation processes.

Specifically, after the inter prediction mode is selected for the current image block, the inter predictor 110 may provide information indicating the selected inter prediction mode of the current image block to the entropy encoder 103 so that the entropy encoder 103 encodes the instruction. Information on the selected inter prediction mode.

The intra predictor 109 may perform intra prediction on the current image block. In particular, the intra predictor 109 may determine an intra prediction mode used to encode the current block. For example, the intra predictor 109 may use a rate-distortion analysis to calculate the rate-distortion values of various intra-prediction modes to be tested, and select the one with the best rate-distortion characteristics from the modes to be tested. Intra prediction mode. In any case, after the intra prediction mode is selected for the image block, the intra predictor 109 may provide information indicating the selected intra prediction mode of the current image block to the entropy encoder 103 so that the entropy encoder 103 encodes the indication Information on the selected intra prediction mode.

After the prediction processing unit 108 generates a prediction block of the current image block via inter prediction and intra prediction, the video encoder 100 forms a residual image block by subtracting the prediction block from the current image block to be encoded. The summer 112 represents one or more components that perform this subtraction operation. The residual video data in the residual block may be included in one or more (transform units) and applied to the transformer 101. The transformer 101 transforms the residual video data into residual transform coefficients using a transform such as a discrete cosine transform (DCT) or a conceptually similar transform. The transformer 101 may transform the residual video data from a pixel value domain to a transform domain, such as a frequency domain.

The transformer 101 may send the obtained transform coefficients to a quantizer 102. A quantizer 102 quantizes the transform coefficients to further reduce the bit code rate. In some examples, the quantizer 102 may then perform a scan of a matrix containing the quantized transform coefficients. Alternatively, the entropy encoder 103 may perform scanning.

After quantization, the entropy encoder 103 entropy encodes the quantized transform coefficients. For example, the entropy encoder 103 can perform context-adaptive variable-length coding (CAVLC), context-adaptive binary arithmetic coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), and probability interval segmentation entropy (PIPE ) Coding or another entropy coding method or technique. After entropy encoding by the entropy encoder 103, the encoded code stream may be transmitted to the video decoder 200, or archived for later transmission or retrieved by the video decoder 200. The entropy encoder 103 may also perform entropy coding on the syntax elements of the current image block to be coded.

The inverse quantizer 104 and the inverse changer 105 respectively apply inverse quantization and inverse transform to reconstruct the residual block in the pixel domain, for example, for later use as a reference block of a reference image. The summer 111 adds the reconstructed residual block to a prediction block generated by the inter predictor 110 or the intra predictor 109 to generate a reconstructed image block. The filter unit 106 may be adapted to reconstructed image blocks to reduce distortion, such as block artifacts. This reconstructed image block is then stored as a reference block in the decoded image buffer 107 and can be used by the inter predictor 110 as a reference block to perform inter prediction on subsequent video frames or blocks in the image.

It should be understood that other structural changes of the video encoder 100 may be used to encode a video stream. For example, for certain image blocks or image frames, the video encoder 100 may directly quantize the residual signal without processing by the transformer 101 and correspondingly does not need to be processed by the inverse transformer 105; or, for some image blocks Or image frames, the video encoder 100 does not generate residual data, and accordingly does not need to be processed by the transformer 101, quantizer 102, inverse quantizer 104, and inverse transformer 105; or, the video encoder 100 may convert the reconstructed image The blocks are stored directly as reference blocks without being processed by the filter unit 106; alternatively, the quantizer 102 and the inverse quantizer 104 in the video encoder 100 may be merged together.

FIG. 3 is a block diagram of an example video decoder 200 described in the embodiment of the present application. In the example of FIG. 3, the video decoder 200 includes an entropy decoder 203, a prediction processing unit 208, an inverse quantizer 204, an inverse transformer 205, a summer 211, a filter unit 206, and a DPB 207. The prediction processing unit 208 may include an inter predictor 210 and an intra predictor 209. In some examples, video decoder 200 may perform a decoding process that is substantially inverse to the encoding process described with respect to video encoder 100 from FIG. 2.

During the decoding process, the video decoder 200 receives from the video encoder 100 an encoded video codestream representing image blocks of the encoded video slice and associated syntax elements. The video decoder 200 may receive video data from the network entity 42, optionally, the video data may also be stored in a video data storage (not shown in the figure). The video data memory may store video data, such as an encoded video code stream, to be decoded by components of the video decoder 200. The video data stored in the video data storage can be obtained, for example, from the storage device 40, from a local video source such as a camera, via a wired or wireless network of video data, or by accessing a physical data storage medium. The video data memory can be used as a decoded image buffer (CPB) for storing encoded video data from the encoded video bitstream. Therefore, although the video data storage is not shown in FIG. 3, the video data storage and the DPB 207 may be the same storage, or may be separately provided storages. Video data memory and DPB 207 can be formed by any of a variety of memory devices, such as: dynamic random access memory (DRAM) including synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), and resistive RAM (RRAM) , Or other types of memory devices. In various examples, the video data memory may be integrated on a chip with other components of the video decoder 200 or provided off-chip relative to those components.

The network entity 42 may be, for example, a server, a MANE, a video editor / splicer, or other such device for implementing one or more of the techniques described above. The network entity 42 may or may not include a video encoder, such as video encoder 100. Before the network entity 42 sends the encoded video code stream to the video decoder 200, the network entity 42 may implement some of the techniques described in this application. In some video decoding systems, the network entity 42 and the video decoder 200 may be part of separate devices, while in other cases, the functionality described with respect to the network entity 42 may be performed by the same device including the video decoder 200. In some cases, the network entity 42 may be an example of the storage device 40 of FIG. 1.

The entropy decoder 203 of the video decoder 200 entropy decodes the code stream to produce quantized coefficients and some syntax elements. The entropy decoder 203 forwards the syntax elements to the prediction processing unit 208. Video decoder 200 may receive syntax elements at a video slice level and / or an image block level.

When a video slice is decoded into an intra decoded (I) slice, the intra predictor 209 of the prediction processing unit 208 may be based on the signaled intra prediction mode and the previously decoded block from the current frame or image. Data to generate prediction blocks for image blocks of the current video slice. When a video slice is decoded into an inter-decoded (ie, B or P) slice, the inter predictor 210 of the prediction processing unit 208 may determine, based on the syntax elements received from the entropy decoder 203, the An inter prediction mode in which a current image block of a video slice is decoded, and based on the determined inter prediction mode, the current image block is decoded (for example, inter prediction is performed). Specifically, the inter predictor 210 may determine whether to use the new inter prediction mode to predict the current image block of the current video slice. If the syntax element indicates that the new inter prediction mode is used to predict the current image block, based on A new inter prediction mode (for example, a new inter prediction mode specified by a syntax element or a default new inter prediction mode) predicts the current image block of the current video slice or a sub-block of the current image block. Motion information, so that the motion information of the current image block or a sub-block of the current image block is used to obtain or generate a prediction block of the current image block or a sub-block of the current image block through a motion compensation process. The motion information here may include reference image information and motion vectors, where the reference image information may include but is not limited to unidirectional / bidirectional prediction information, a reference image list number, and a reference image index corresponding to the reference image list. For inter prediction, a prediction block may be generated from one of reference pictures within one of the reference picture lists. The video decoder 200 may construct a reference image list, that is, a list 0 and a list 1, based on the reference images stored in the DPB 207. The reference frame index of the current image may be included in one or more of the reference frame list 0 and list 1. In some examples, the video encoder 100 may signal whether to use a new inter prediction mode to decode a specific syntax element of a specific block, or may be a signal to indicate whether to use a new inter prediction mode. And indicating which new inter prediction mode is used to decode a specific syntax element of a specific block. It should be understood that the inter predictor 210 here performs a motion compensation process.

The inverse quantizer 204 inverse quantizes, that is, dequantizes, the quantized transform coefficients provided in the code stream and decoded by the entropy decoder 203. The inverse quantization process may include using a quantization parameter calculated by the video encoder 100 for each image block in the video slice to determine the degree of quantization that should be applied and similarly to determine the degree of inverse quantization that should be applied. The inverse transformer 205 applies an inverse transform to transform coefficients, such as an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process to generate a residual block in the pixel domain.

After the inter predictor 210 generates a prediction block for the current image block or a subblock of the current image block, the video decoder 200 works by comparing the residual block from the inverse transformer 205 with the corresponding prediction generated by the inter predictor 210 The blocks are summed to get the reconstructed block, that is, the decoded image block. The summer 211 represents a component that performs this summing operation. When needed, a loop filter (in or after the decoding loop) can also be used to smooth pixel transitions or otherwise improve video quality. The filter unit 206 may represent one or more loop filters, such as a deblocking filter, an adaptive loop filter (ALF), and a sample adaptive offset (SAO) filter. Although the filter unit 206 is shown as an in-loop filter in FIG. 3, in other implementations, the filter unit 206 may be implemented as a post-loop filter. In one example, the filter unit 206 is adapted to reconstruct a block to reduce block distortion, and the result is output as a decoded video stream. In addition, the decoded image block in a given frame or image can also be stored in the decoded image buffer 207, and the reference image used for subsequent motion compensation can be stored via the DPB 207. The DPB 207 may be part of the memory, which may also store the decoded video for later presentation on a display device (such as the display device 220 of FIG. 1), or may be separate from such memory.

It should be understood that other structural changes of the video decoder 200 may be used to decode the encoded video code stream. For example, the video decoder 200 may generate an output video stream without being processed by the filter unit 206; or, for certain image blocks or image frames, the entropy decoder 203 of the video decoder 200 does not decode the quantized coefficients, and accordingly, It does not need to be processed by the inverse quantizer 204 and the inverse transformer 205.

As noted previously, the techniques of this application exemplarily involve inter-frame decoding. It should be understood that the techniques of this application may be performed by any of the video decoders described in this application. The video decoder includes, for example, the video encoder 100 and video decoding as shown and described with respect to FIGS. 1-3.器 200。 200. That is, in one feasible implementation, the inter predictor 110 described with respect to FIG. 2 may perform specific techniques described below when performing inter prediction during encoding of a block of video data. In another possible implementation, the inter predictor 210 described with respect to FIG. 3 may perform specific techniques described below when performing inter prediction during decoding of blocks of video data. Thus, a reference to a generic "video encoder" or "video decoder" may include video encoder 100, video decoder 200, or another video encoding or coding unit.

It should be understood that, in the video encoder 100 and the video decoder 200 of the present application, the processing result for a certain link may be further processed and output to the next link, for example, in interpolation filtering, motion vector derivation or loop After filtering and other steps, the results of the corresponding steps are further clipped or shifted.

For example, the motion vector of the control point of the current image block derived according to the motion vector of the adjacent affine coding block may be further processed, which is not limited in this application. For example, the value range of the motion vector is restricted so that it is within a certain bit width. Assuming that the bit width of the allowed motion vector is bitDepth, the range of the motion vector is -2 ^ (bitDepth-1) to 2 ^ (bitDepth-1) -1, where the "^" symbol represents the power. If bitDepth is 16, the value ranges from -32768 to 32767. If bitDepth is 18, the value ranges from -131072 to 131071. Constraints can be implemented in two ways:

Method 1: Remove the high bits of the motion vector overflow:

ux = (vx + 2 ^bitDepth )% 2 ^bitDepth

vx = (ux> = 2 ^bitDepth-1 )? (ux-2 ^bitDepth ): ux

uy = (vy + 2 ^bitDepth )% 2 ^bitDepth

vy = (uy> = 2 ^bitDepth-1 )? (uy-2 ^bitDepth ): uy

For example, the value of vx is -32769, and the value obtained by the above formula is 32767. Because in the computer, the value is stored in the two's complement form, and the two's complement of -32769 is 1,0111,1111,1111,1111 (17 bits). The computer treats the overflow as discarding the high order bits, so the value of vx For 0111, 1111, 1111, 1111, it is 32767, which is consistent with the result obtained by formula processing.

Method 2: Clipping the motion vector, as shown in the following formula:

vx = Clip3 (-2 ^bitDepth-1 , 2 ^bitDepth-1 -1, vx)

vy = Clip3 (-2 ^bitDepth-1 , 2 ^bitDepth-1 -1, vy)

The definition of Clip3 is to clamp the value of z to the interval [x, y]:

FIG. 4 is a schematic block diagram of an inter prediction module 121 according to an embodiment of the present application. The inter prediction module 121, for example, may include a motion estimation unit and a motion compensation unit. The relationship between PU and CU is different in different video compression codecs. The inter prediction module 121 may partition a current CU into a PU according to a plurality of partitioning modes. For example, the inter prediction module 121 may partition a current CU into a PU according to 2N × 2N, 2N × N, N × 2N, and N × N partition modes. In other embodiments, the current CU is the current PU, which is not limited.

The inter prediction module 121 may perform integer motion estimation (IME) and then perform fractional motion estimation (FME) on each of the PUs. When the inter prediction module 121 performs IME on a PU, the inter prediction module 121 may search a reference block for a PU in one or more reference images. After the reference block for the PU is found, the inter prediction module 121 may generate a motion vector indicating the spatial displacement between the PU and the reference block for the PU with integer precision. When the inter prediction module 121 performs FME on the PU, the inter prediction module 121 may improve a motion vector generated by performing IME on the PU. A motion vector generated by performing FME on a PU may have sub-integer precision (eg, 1/2 pixel precision, 1/4 pixel precision, etc.). After generating a motion vector for the PU, the inter prediction module 121 may use the motion vector for the PU to generate a predictive image block for the PU.

In some feasible implementations of the inter prediction module 121 using the AMVP mode to signal the motion information of the decoding end PU, the inter prediction module 121 may generate a list of candidate prediction motion vectors for the PU. The candidate prediction motion vector list may include one or more original candidate prediction motion vectors and one or more additional candidate prediction motion vectors derived from the original candidate prediction motion vectors. After generating the candidate prediction motion vector list for the PU, the inter prediction module 121 may select the candidate prediction motion vector from the candidate prediction motion vector list and generate a motion vector difference (MVD) for the PU. The MVD for a PU may indicate a difference between a motion vector indicated by a selected candidate prediction motion vector and a motion vector generated for the PU using IME and FME. In these feasible implementations, the inter prediction module 121 may output a candidate prediction motion vector index that identifies the position of the selected candidate prediction motion vector in the candidate prediction motion vector list. The inter prediction module 121 may also output the MVD of the PU. A detailed implementation of the advanced motion vector prediction (AMVP) mode in the embodiment of the present application in FIG. 6 is described in detail below.

In addition to generating motion information for the PU by performing IME and FME on the PU, the inter prediction module 121 may also perform a merge operation on each of the PUs. When the inter prediction module 121 performs a merge operation on the PU, the inter prediction module 121 may generate a list of candidate prediction motion vectors for the PU. The candidate prediction motion vector list for the PU may include one or more original candidate prediction motion vectors and one or more additional candidate prediction motion vectors derived from the original candidate prediction motion vectors. The original candidate prediction motion vector in the candidate prediction motion vector list may include one or more spatial candidate prediction motion vectors and temporal candidate prediction motion vectors. The spatial candidate prediction motion vector may indicate motion information of other PUs in the current image. The temporal candidate prediction motion vector may be based on motion information of a corresponding PU different from the current picture. The temporal candidate prediction motion vector may also be referred to as temporal motion vector prediction (TMVP).

After generating the candidate prediction motion vector list, the inter prediction module 121 may select one of the candidate prediction motion vectors from the candidate prediction motion vector list. The inter prediction module 121 may then generate a predictive image block for the PU based on the reference block indicated by the motion information of the PU. In the merge mode, the motion information of the PU may be the same as the motion information indicated by the selected candidate prediction motion vector. Figure 5 described below illustrates an exemplary flowchart for Merge.

According to the technology of the present application, during the construction of the candidate predictive motion vector list, the original candidate predictive motion vector can be directly included in the candidate predictive motion vector list, and one type of additional candidate predictive motion vector is indicated through the identification bit to control the candidate predictive motion. The length of the vector list. In particular, different types of extra candidate prediction motion vectors are indicated by different identification bits. During codec, when an identification bit is selected, a prediction motion vector is selected from the set of extra candidate prediction motion vectors indicated by the identification bit. The candidate prediction motion vector indicated by the identification bit may be a preset motion information offset.

After generating a predictive image block for a PU based on IME and FME and a predictive image block for a PU based on a merge operation, the inter prediction module 121 may select a predictive image block generated through the FME operation or a merge operation. Predictive image blocks. In some feasible implementations, the inter prediction module 121 may select a predictive image for a PU based on a code rate-distortion cost analysis of the predictive image block generated by the FME operation and the predictive image block generated by the merge operation. Piece.

After the inter prediction module 121 has selected a predictive image block of the PU generated by dividing the current CU according to each of the partitioning modes (in some embodiments, the coding tree unit CTU is divided into CUs, no further (It is divided into smaller PUs. At this time, the PU is equivalent to the CU.) The inter prediction module 121 may select a partitioning mode for the current CU. In some embodiments, the inter prediction module 121 may select a rate-distortion cost analysis for a selected predictive image block of the PU generated by segmenting the current CU according to each of the partitioning modes to select the Split mode. The inter prediction module 121 may output a predictive image block associated with a PU belonging to the selected partition mode to the residual generation module 102. The inter prediction module 121 may output a syntax element indicating motion information of a PU belonging to the selected partition mode to the entropy encoding module.

In the schematic diagram of FIG. 4, the inter prediction module 121 includes IME modules 180A to 180N (collectively referred to as "IME module 180"), FME modules 182A to 182N (collectively referred to as "FME module 182"), and merge modules 184A to 184N (collectively Are "merging module 184"), PU mode decision modules 186A to 186N (collectively referred to as "PU mode decision module 186") and CU mode decision module 188 (may also include performing a mode decision process from CTU to CU).

The IME module 180, the FME module 182, and the merge module 184 may perform an IME operation, an FME operation, and a merge operation on a PU of the current CU. The inter prediction module 121 is illustrated in the schematic diagram of FIG. 4 as including a separate IME module 180, an FME module 182, and a merging module 184 for each PU of each partitioning mode of the CU. In other feasible implementations, the inter prediction module 121 does not include a separate IME module 180, an FME module 182, and a merge module 184 for each PU of each partitioning mode of the CU.

As illustrated in the schematic diagram of FIG. 4, the IME module 180A, the FME module 182A, and the merge module 184A may perform IME operations, FME operations, and merge operations on a PU generated by dividing a CU according to a 2N × 2N split mode. The PU mode decision module 186A may select one of the predictive image blocks generated by the IME module 180A, the FME module 182A, and the merge module 184A.

The IME module 180B, the FME module 182B, and the merge module 184B may perform an IME operation, an FME operation, and a merge operation on a left PU generated by dividing a CU according to an N × 2N division mode. The PU mode decision module 186B may select one of the predictive image blocks generated by the IME module 180B, the FME module 182B, and the merge module 184B.

The IME module 180C, the FME module 182C, and the merge module 184C may perform an IME operation, an FME operation, and a merge operation on a right PU generated by dividing a CU according to an N × 2N division mode. The PU mode decision module 186C may select one of the predictive image blocks generated by the IME module 180C, the FME module 182C, and the merge module 184C.

The IME module 180N, the FME module 182N, and the merge module 184 may perform an IME operation, an FME operation, and a merge operation on a lower right PU generated by dividing a CU according to an N × N division mode. The PU mode decision module 186N may select one of the predictive image blocks generated by the IME module 180N, the FME module 182N, and the merge module 184N.

The PU mode decision module 186 may select a predictive image block based on a code rate-distortion cost analysis of a plurality of possible predictive image blocks, and select a predictive image block that provides the best code rate-distortion cost for a given decoding situation. For example, for bandwidth-constrained applications, the PU mode decision module 186 may prefer to select predictive image blocks that increase the compression ratio, while for other applications, the PU mode decision module 186 may prefer to select predictive images that increase the quality of the reconstructed video. Piece. After the PU mode decision module 186 selects a predictive image block for the PU of the current CU, the CU mode decision module 188 selects a partition mode for the current CU and outputs the predictive image block and motion information of the PU belonging to the selected partition mode. .

FIG. 5 is an implementation flowchart of a merge mode in an embodiment of the present application. A video encoder (eg, video encoder 20) may perform a merge operation 200. The merging operation 200 may include: 202. Generate a candidate list for a current prediction unit. 204. Generate a predictive video block associated with a candidate in the candidate list. 206. Select a candidate from the candidate list. 208. Output candidates. The candidate refers to a candidate motion vector or candidate motion information.

In other feasible implementations, the video encoder may perform a merge operation different from the merge operation 200. For example, in other feasible implementations, the video encoder may perform a merge operation, where the video encoder performs more or fewer steps than the merge operation 200 or steps different from the merge operation 200. In other possible implementations, the video encoder may perform the steps of the merge operation 200 in a different order or in parallel. The encoder may also perform a merge operation 200 on a PU encoded in a skip mode.

After the video encoder starts the merge operation 200, the video encoder may generate a list of candidate predicted motion vectors for the current PU (202). The video encoder may generate a list of candidate prediction motion vectors for the current PU in various ways. For example, the video encoder may generate a list of candidate prediction motion vectors for the current PU according to one of the example techniques described below with respect to FIGS. 8-12. Wherein, according to the technology of the present application, the candidate prediction motion vector list for the current PU includes at least one first candidate motion vector and at least one second candidate motion vector set identifier.

As mentioned before, the candidate prediction motion vector list for the current PU may include a temporal candidate prediction motion vector. The temporal candidate prediction motion vector may indicate motion information of a co-located PU in the time domain. A co-located PU may be spatially in the same position in the image frame as the current PU, but in a reference picture instead of the current picture. In this application, a reference picture including a PU corresponding to the time domain may be referred to as a related reference picture. A reference image index of a related reference image may be referred to as a related reference image index in this application. As described previously, the current image may be associated with one or more reference image lists (eg, list 0, list 1, etc.). The reference image index may indicate a reference image by indicating a position in a reference image list of the reference image. In some feasible implementations, the current image may be associated with a combined reference image list.

In some video encoders, the related reference picture index is the reference picture index of the PU covering the reference index source position associated with the current PU. In these video encoders, the reference index source location associated with the current PU is adjacent to the left of the current PU or above the current PU. In this application, if an image block associated with a PU includes a specific location, the PU may "cover" the specific location. In these video encoders, if a reference index source location is not available, the video encoder can use a zero reference picture index.

However, there may be examples where the reference index source location associated with the current PU is within the current CU. In these examples, if the PU is above or to the left of the current CU, the PU covering the reference index source location associated with the current PU may be considered available. However, the video encoder may need to access motion information of another PU of the current CU in order to determine a reference picture containing a co-located PU. Therefore, these video encoders may use motion information (ie, a reference picture index) of a PU belonging to the current CU to generate a temporal candidate prediction motion vector for the current PU. In other words, these video encoders may use temporal information of a PU belonging to the current CU to generate a temporal candidate prediction motion vector. Therefore, the video encoder may not be able to generate a list of candidate prediction motion vectors for the current PU and the PU covering the reference index source position associated with the current PU in parallel.

The video encoder may explicitly set the relevant reference picture index without referring to the reference picture index of any other PU. This may enable the video encoder to generate candidate prediction motion vector lists for the current PU and other PUs of the current CU in parallel. Because the video encoder explicitly sets the relevant reference picture index, the relevant reference picture index is not based on the motion information of any other PU of the current CU. In some feasible implementations where the video encoder explicitly sets the relevant reference picture index, the video encoder may always set the relevant reference picture index to a fixed, predefined preset reference picture index (eg, 0). In this way, the video encoder may generate a temporal candidate prediction motion vector based on the motion information of the co-located PU in the reference frame indicated by the preset reference picture index, and may include the temporal candidate prediction motion vector in the candidate prediction of the current CU List of motion vectors.

In a feasible implementation where the video encoder explicitly sets the relevant reference picture index, the video encoder may be explicitly used in a syntax structure (e.g., image header, slice header, APS, or another syntax structure) The related reference picture index is signaled. In this feasible implementation manner, the video encoder may signal the decoder to the relevant reference picture index for each LCU (ie, CTU), CU, PU, TU, or other type of sub-block. For example, the video encoder may signal that the relevant reference picture index for each PU of the CU is equal to "1".

In some feasible implementations, the relevant reference image index may be set implicitly rather than explicitly. In these feasible implementations, the video encoder may use the motion information of the PU in the reference image indicated by the reference image index of the PU covering the location outside the current CU to generate a candidate prediction motion vector list for the PU of the current CU. Each time candidate predicts a motion vector, even if these locations are not strictly adjacent to the current PU.

After generating a list of candidate prediction motion vectors for the current PU, the video encoder may generate predictive image blocks associated with the candidate prediction motion vectors in the candidate prediction motion vector list (204). The video encoder may generate the candidate prediction motion vector by determining the motion information of the current PU based on the motion information of the indicated candidate prediction motion vector and then generating a predictive image block based on one or more reference blocks indicated by the motion information of the current PU. Associated predictive image blocks. The video encoder may then select one of the candidate prediction motion vectors from the candidate prediction motion vector list (206). The video encoder can select candidate prediction motion vectors in various ways. For example, a video encoder may select one of the candidate prediction motion vectors based on a code rate-distortion cost analysis of each of the predictive image blocks associated with the candidate prediction motion vector.

After selecting the candidate prediction motion vector, the video encoder may output a candidate prediction motion vector index (208). The candidate prediction motion vector index may indicate a position where a candidate prediction motion vector is selected in the candidate prediction motion vector list. In some feasible implementations, the candidate prediction motion vector index may be represented as "merge_idx".

FIG. 6 is an implementation flowchart of an advanced motion vector prediction (AMVP) mode in an embodiment of the present application. A video encoder (eg, video encoder 20) may perform AMVP operations 210. The AMVP operation 210 may include: 211. Generate one or more motion vectors for a current prediction unit. 212. Generate a predictive video block for the current prediction unit. 213. Generate a candidate list for the current prediction unit. 214. Generate a motion vector difference. 215 Select a candidate from the candidate list. 216. Output a reference picture index, a candidate index, and a motion vector difference for selecting a candidate. The candidate refers to a candidate motion vector or candidate motion information.

After the video encoder starts AMVP operation 210, the video encoder may generate one or more motion vectors for the current PU (211). The video encoder may perform integer motion estimation and fractional motion estimation to generate motion vectors for the current PU. As described earlier, the current image may be associated with two reference image lists (List 0 and List 1). If the current PU is unidirectionally predicted, the video encoder may generate a list 0 motion vector or a list 1 motion vector for the current PU. The list 0 motion vector may indicate a spatial displacement between an image block of the current PU and a reference block in a reference image in list 0. The list 1 motion vector may indicate a spatial displacement between an image block of the current PU and a reference block in a reference image in list 1. If the current PU is bi-predicted, the video encoder may generate a list 0 motion vector and a list 1 motion vector for the current PU.

After generating one or more motion vectors for the current PU, the video encoder may generate predictive image blocks for the current PU (212). The video encoder may generate predictive image blocks for the current PU based on one or more reference blocks indicated by one or more motion vectors for the current PU.

In addition, the video encoder may generate a list of candidate predicted motion vectors for the current PU (213). The video decoder may generate a list of candidate prediction motion vectors for the current PU in various ways. For example, the video encoder may generate a list of candidate prediction motion vectors for the current PU according to one or more of the possible implementations described below with respect to FIGS. 8 to 12. In some feasible implementations, when the video encoder generates a list of candidate prediction motion vectors in the AMVP operation 210, the list of candidate prediction motion vectors may be limited to two candidate prediction motion vectors. In contrast, when a video encoder generates a list of candidate prediction motion vectors in a merge operation, the list of candidate prediction motion vectors may include more candidate prediction motion vectors (eg, five candidate prediction motion vectors).

After generating a list of candidate prediction motion vectors for the current PU, the video encoder may generate one or more motion vector differences (MVD) for each candidate prediction motion vector in the list of candidate prediction motion vectors (214). The video encoder may generate a motion vector difference for the candidate prediction motion vector by determining a difference between the motion vector indicated by the candidate prediction motion vector and a corresponding motion vector of the current PU.

If the current PU is unidirectionally predicted, the video encoder may generate a single MVD for each candidate prediction motion vector. If the current PU is bi-predicted, the video encoder may generate two MVDs for each candidate prediction motion vector. The first MVD may indicate a difference between the motion vector of the candidate prediction motion vector and the list 0 motion vector of the current PU. The second MVD may indicate a difference between the motion vector of the candidate prediction motion vector and the list 1 motion vector of the current PU.

The video encoder may select one or more of the candidate prediction motion vectors from the candidate prediction motion vector list (215). The video encoder may select one or more candidate prediction motion vectors in various ways. For example, a video encoder may select a candidate prediction motion vector with an associated motion vector that matches the motion vector to be encoded with minimal error, which may reduce the number of bits required to represent the motion vector difference for the candidate prediction motion vector.

After selecting one or more candidate prediction motion vectors, the video encoder may output one or more reference image indexes for the current PU, one or more candidate prediction motion vector indexes, and one or more selected candidate motion vectors. One or more motion vector differences of the predicted motion vector (216).

In examples where the current picture is associated with two reference picture lists (List 0 and List 1) and the current PU is unidirectionally predicted, the video encoder may output a reference picture index ("ref_idx_10") for List 0 or for Reference image index of list 1 ("ref_idx_11"). The video encoder may also output a candidate prediction motion vector index ("mvp_10_flag") indicating the position of the selected candidate prediction motion vector for the list 0 motion vector of the current PU in the candidate prediction motion vector list. Alternatively, the video encoder may output a candidate prediction motion vector index ("mvp_11_flag") indicating the position of the selected candidate prediction motion vector for the list 1 motion vector of the current PU in the candidate prediction motion vector list. The video encoder may also output a list 0 motion vector or a list 1 motion vector MVD for the current PU.

In the example where the current picture is associated with two reference picture lists (List 0 and List 1) and the current PU is bi-predicted, the video encoder may output the reference picture index ("ref_idx_10") for List 0 and the list Reference image index of 1 ("ref_idx_11"). The video encoder may also output a candidate prediction motion vector index ("mvp_10_flag") indicating the position of the selected candidate prediction motion vector for the list 0 motion vector of the current PU in the candidate prediction motion vector list. In addition, the video encoder may output a candidate prediction motion vector index ("mvp_11_flag") indicating the position of the selected candidate prediction motion vector for the list 1 motion vector of the current PU in the candidate prediction motion vector list. The video encoder may also output the MVD of the list 0 motion vector for the current PU and the MVD of the list 1 motion vector for the current PU.

FIG. 7 is an implementation flowchart of motion compensation performed by a video decoder (such as video decoder 30) in an embodiment of the present application.

When the video decoder performs motion compensation operation 220, the video decoder may receive an indication of the selected candidate prediction motion vector for the current PU (222). For example, the video decoder may receive a candidate prediction motion vector index indicating the position of the selected candidate prediction motion vector within the candidate prediction motion vector list of the current PU.

If the motion information of the current PU is encoded using the AMVP mode and the current PU is bidirectionally predicted, the video decoder may receive the first candidate prediction motion vector index and the second candidate prediction motion vector index. The first candidate prediction motion vector index indicates the position of the selected candidate prediction motion vector for the list 0 motion vector of the current PU in the candidate prediction motion vector list. The second candidate prediction motion vector index indicates the position of the selected candidate prediction motion vector for the list 1 motion vector of the current PU in the candidate prediction motion vector list. In some feasible implementations, a single syntax element may be used to identify two candidate prediction motion vector indexes.

In some feasible implementation manners, if the candidate prediction motion vector list constructed according to the technology of the present application, the video decoder may accept the candidate prediction motion indicating the position of the selected candidate prediction motion vector within the candidate prediction motion vector list of the current PU. Vector index, or accept an identifier indicating the position of the classification to which the selected candidate prediction motion vector belongs in the candidate prediction motion vector list of the current PU, and the candidate prediction motion vector of the position of the selected candidate prediction motion vector in its classification index.

In addition, the video decoder may generate a list of candidate predicted motion vectors for the current PU (224). The video decoder may generate this candidate prediction motion vector list for the current PU in various ways. For example, the video decoder may use the techniques described below with reference to FIGS. 8 to 12 to generate a list of candidate prediction motion vectors for the current PU. When the video decoder generates a temporal candidate prediction motion vector for a candidate prediction motion vector list, the video decoder may explicitly or implicitly set a reference image index identifying a reference image including a co-located PU, as described above Figure 5 describes this. According to the technology of the present application, during the construction of the candidate prediction motion vector list, a type of candidate prediction motion vector may be indicated by an identification bit in the candidate prediction motion vector list to control the length of the candidate prediction motion vector list.

After generating the candidate prediction motion vector list for the current PU, the video decoder may determine the current PU's based on the motion information indicated by one or more selected candidate prediction motion vectors in the candidate prediction motion vector list for the current PU. Motion information (225). For example, if the motion information of the current PU is encoded using a merge mode, the motion information of the current PU may be the same as the motion information indicated by the selected candidate prediction motion vector. If the motion information of the current PU is encoded using the AMVP mode, the video decoder may use one or more MVDs indicated in the one or more motion vectors and the code stream indicated by the or the selected candidate prediction motion vector. To reconstruct one or more motion vectors of the current PU. The reference image index and prediction direction identifier of the current PU may be the same as the reference image index and prediction direction identifier of the one or more selected candidate prediction motion vectors. After determining the motion information of the current PU, the video decoder may generate a predictive image block for the current PU based on one or more reference blocks indicated by the motion information of the current PU (226).

FIG. 8 is an exemplary schematic diagram of a coding unit (CU) and an adjacent position image block associated with the coding unit (CU) in the embodiment of the present application, illustrating CU250 and schematic candidate prediction motion vector positions 252A to 252E associated with CU250. . This application may collectively refer to the candidate prediction motion vector positions 252A to 252E as the candidate prediction motion vector positions 252. The candidate prediction motion vector position 252 indicates a spatial candidate prediction motion vector in the same image as the CU 250. The candidate prediction motion vector position 252A is positioned to the left of CU250. The candidate prediction motion vector position 252B is positioned above the CU250. The candidate prediction motion vector position 252C is positioned at the upper right of CU250. The candidate prediction motion vector position 252D is positioned at the lower left of CU250. The candidate prediction motion vector position 252E is positioned at the upper left of the CU250. FIG. 8 is a schematic embodiment of a manner for providing a list of candidate prediction motion vectors that the inter prediction module 121 and the motion compensation module can generate. The embodiments will be explained below with reference to the inter prediction module 121, but it should be understood that the motion compensation module can implement the same technique and thus generate the same candidate prediction motion vector list.

FIG. 9 is a flowchart of constructing a candidate prediction motion vector list according to an embodiment of the present application. The technique of FIG. 9 will be described with reference to a list including five candidate prediction motion vectors, but the techniques described herein may also be used with lists of other sizes. The five candidate prediction motion vectors may each have an index (eg, 0 to 4). The technique of FIG. 9 will be described with reference to a general video decoder. A general video decoder may be, for example, a video encoder (such as video encoder 20) or a video decoder (such as video decoder 30). The selected prediction motion vector list constructed based on the technology of the present application is described in detail in the following embodiments, and will not be repeated here.

To reconstruct a list of candidate prediction motion vectors according to the embodiment of FIG. 9, the video decoder first considers four spatial candidate prediction motion vectors (902). The four spatial candidate prediction motion vectors may include candidate prediction

motion vector positions

252A, 252B, 252C, and 252D. The four spatial candidate prediction motion vectors correspond to motion information of four PUs in the same image as the current CU (for example, CU250). The video decoder may consider the four spatial candidate prediction motion vectors in the list in a particular order. For example, the candidate prediction motion vector position 252A may be considered first. If the candidate prediction motion vector position 252A is available, the candidate prediction motion vector position 252A may be assigned to index 0. If the candidate prediction motion vector position 252A is not available, the video decoder may not include the candidate prediction motion vector position 252A in the candidate prediction motion vector list. Candidate prediction motion vector positions may be unavailable for various reasons. For example, if the candidate prediction motion vector position is not within the current image, the candidate prediction motion vector position may not be available. In another feasible implementation, if the candidate prediction motion vector position is intra-predicted, the candidate prediction motion vector position may not be available. In another feasible implementation, if the candidate prediction motion vector position is in a slice different from the current CU, the candidate prediction motion vector position may not be available.

After considering the candidate prediction motion vector position 252A, the video decoder may next consider the candidate prediction motion vector position 252B. If the candidate prediction motion vector position 252B is available and different from the candidate prediction motion vector position 252A, the video decoder may add the candidate prediction motion vector position 252B to the candidate prediction motion vector list. In this particular context, the terms "same" and "different" refer to motion information associated with candidate predicted motion vector locations. Therefore, two candidate prediction motion vector positions are considered the same if they have the same motion information, and are considered different if they have different motion information. If the candidate prediction motion vector position 252A is not available, the video decoder may assign the candidate prediction motion vector position 252B to index 0. If the candidate prediction motion vector position 252A is available, the video decoder may assign the candidate prediction motion vector position 252 to index 1. If the candidate prediction motion vector position 252B is not available or the same as the candidate prediction motion vector position 252A, the video decoder skips the candidate prediction motion vector position 252B and does not include it in the candidate prediction motion vector list.

The candidate prediction motion vector position 252C is similarly considered by the video decoder for inclusion in the list. If the candidate prediction motion vector position 252C is available and not the same as the candidate prediction

motion vector positions

252B and 252A, the video decoder assigns the candidate prediction motion vector position 252C to the next available index. If the candidate prediction motion vector position 252C is unavailable or different from at least one of the candidate prediction

motion vector positions

252A and 252B, the video decoder does not include the candidate prediction motion vector position 252C in the candidate prediction motion vector list. Next, the video decoder considers the candidate prediction motion vector position 252D. If the candidate prediction motion vector position 252D is available and is not the same as the candidate prediction

motion vector position

252A, 252B, and 252C, the video decoder assigns the candidate prediction motion vector position 252D to the next available index. If the candidate prediction motion vector position 252D is unavailable or different from at least one of the candidate prediction

motion vector positions

252A, 252B, and 252C, the video decoder does not include the candidate prediction motion vector position 252D in the candidate prediction motion vector list. The above embodiments generally describe exemplarily considering candidate prediction motion vectors 252A to 252D for inclusion in the candidate prediction motion vector list, but in some embodiments, all candidate prediction motion vectors 252A to 252D may be first added to the candidate A list of predicted motion vectors, with duplicates removed from the list of candidate predicted motion vectors later.

After the video decoder considers the first four spatial candidate prediction motion vectors, the candidate prediction motion vector list may include four spatial candidate prediction motion vectors or the list may include less than four spatial candidate prediction motion vectors. If the list includes four spatial candidate prediction motion vectors (904, Yes), the video decoder considers temporal candidate prediction motion vectors (906). The temporal candidate prediction motion vector may correspond to motion information of a co-located PU of a picture different from the current picture. If a temporal candidate prediction motion vector is available and different from the first four spatial candidate prediction motion vectors, the video decoder assigns the temporal candidate prediction motion vector to index 4. If the temporal candidate prediction motion vector is not available or is the same as one of the first four spatial candidate prediction motion vectors, the video decoder does not include the temporal candidate prediction motion vector in the candidate prediction motion vector list. Therefore, after the video decoder considers temporal candidate prediction motion vectors (906), the candidate prediction motion vector list may include five candidate prediction motion vectors (the first four spatial candidate prediction motion vectors considered at block 902 and the The temporal candidate prediction motion vector) or may include four candidate prediction motion vectors (the first four spatial candidate prediction motion vectors considered at block 902). If the candidate prediction motion vector list includes five candidate prediction motion vectors (908, Yes), the video decoder completes building the list.

If the candidate prediction motion vector list includes four candidate prediction motion vectors (908, No), the video decoder may consider a fifth spatial candidate prediction motion vector (910). The fifth spatial candidate prediction motion vector may, for example, correspond to the candidate prediction motion vector position 252E. If the candidate prediction motion vector at position 252E is available and different from the candidate prediction motion vectors at

positions

252A, 252B, 252C, and 252D, the video decoder may add a fifth spatial candidate prediction motion vector to the candidate prediction motion vector list. The five-space candidate prediction motion vector is assigned to index 4. If the candidate prediction motion vector at position 252E is unavailable or different from the candidate prediction motion vector at

candidate position

252A, 252B, 252C, and 252D, the video decoder may not include the candidate prediction motion vector at position 252 in Candidate prediction motion vector list. So after considering the fifth spatial candidate prediction motion vector (910), the list may include five candidate prediction motion vectors (the first four spatial candidate prediction motion vectors considered at box 902 and the fifth spatial candidate prediction considered at box 910) Motion vectors) or may include four candidate prediction motion vectors (the first four spatial candidate prediction motion vectors considered at block 902).

If the candidate prediction motion vector list includes five candidate prediction motion vectors (912, Yes), the video decoder finishes generating the candidate prediction motion vector list. If the candidate prediction motion vector list includes four candidate prediction motion vectors (912, No), the video decoder adds artificially generated candidate prediction motion vectors (914) until the list includes five candidate prediction motion vectors (916, Yes).

If the list includes fewer than four spatial candidate prediction motion vectors after the video decoder considers the first four spatial candidate prediction motion vectors (904, No), the video decoder may consider the fifth spatial candidate prediction motion vector (918). The fifth spatial candidate prediction motion vector may, for example, correspond to the candidate prediction motion vector position 252E. If the candidate prediction motion vector at position 252E is available and different from the candidate prediction motion vectors already included in the candidate prediction motion vector list, the video decoder may add a fifth spatial candidate prediction motion vector to the candidate prediction motion vector list, the The five-space candidate prediction motion vector is assigned to the next available index. If the candidate prediction motion vector at position 252E is unavailable or different from one of the candidate prediction motion vectors already included in the candidate prediction motion vector list, the video decoder may not include the candidate prediction motion vector at position 252E in Candidate prediction motion vector list. The video decoder may then consider the temporal candidate prediction motion vector (920). If a temporal candidate prediction motion vector is available and different from the candidate prediction motion vectors already included in the candidate prediction motion vector list, the video decoder may add the temporal candidate prediction motion vector to the candidate prediction motion vector list, the temporal candidate The predicted motion vector is assigned to the next available index. If the temporal candidate prediction motion vector is not available or is not different from one of the candidate prediction motion vectors already included in the candidate prediction motion vector list, the video decoder may not include the temporal candidate prediction motion vector in the candidate prediction motion vector. List.

If, after considering the fifth spatial candidate prediction motion vector (block 918) and the temporal candidate prediction motion vector (block 920), the candidate prediction motion vector list includes five candidate prediction motion vectors (922, Yes), the video decoder finishes generating List of candidate prediction motion vectors. If the list of candidate prediction motion vectors includes less than five candidate prediction motion vectors (922, No), the video decoder adds artificially generated candidate prediction motion vectors (914) until the list includes five candidate prediction motion vectors (916, Yes) until.

In a possible implementation, an additional merge candidate prediction motion vector may be artificially generated after the spatial candidate prediction motion vector and the temporal candidate prediction motion vector, so that the size of the merge candidate prediction motion vector list is fixed to the designation of the merge candidate prediction motion vector. Number (for example, five of the possible implementations of FIG. 9 above). Additional merge candidate prediction motion vectors may include exemplary combined bi-predictive merge candidate prediction motion vectors (candidate prediction motion vector 1), scaled bi-directional predictive merge candidate prediction motion vectors (candidate prediction motion vector 2), and zero vectors Merge / AMVP candidate prediction motion vector (candidate prediction motion vector 3). According to the technology of the present application, a spatial candidate prediction motion vector and a temporal candidate prediction motion vector may be directly included in the candidate prediction motion vector list, and an artificially generated additional merge candidate prediction motion vector is indicated in the candidate prediction motion vector list through an identification bit.

FIG. 10 is an exemplary schematic diagram of adding a combined candidate motion vector to a merge mode candidate prediction motion vector list in an embodiment of the present application. The combined bi-directional predictive merge candidate prediction motion vector may be generated by combining the original merge candidate prediction motion vector. Specifically, two candidate prediction motion vectors (which have mvL0_A and ref0 or mvL1_B and ref0) among the original candidate prediction motion vectors may be used to generate a bidirectional predictive merge candidate prediction motion vector. In FIG. 10, two candidate prediction motion vectors are included in the original merge candidate prediction motion vector list. The prediction type of one candidate prediction motion vector is List 0 unidirectional prediction, and the prediction type of the other candidate prediction motion vector is List 1 unidirectional prediction. In this feasible implementation, mvL0_A and ref0 are picked from list 0, and mvL1_B and ref0 are picked from list 1, and then a bidirectional predictive merge candidate prediction motion vector (which has mvL0_A and ref0 in list 0 and MvL1_B and ref0) in Listing 1 and check whether it is different from the candidate prediction motion vectors that have been included in the candidate prediction motion vector list. If it is different, the video decoder may include the bi-directional predictive merge candidate prediction motion vector in the candidate prediction motion vector list.

FIG. 11 is an exemplary schematic diagram of adding a scaled candidate motion vector to a merge mode candidate prediction motion vector list in an embodiment of the present application. The scaled bi-directional predictive merge candidate prediction motion vector may be generated by scaling the original merge candidate prediction motion vector. Specifically, a candidate prediction motion vector (which may have mvL0_A and ref0 or mvL1_A and ref1) from the original candidate prediction motion vector may be used to generate a bidirectional predictive merge candidate prediction motion vector. In the feasible implementation of FIG. 11, two candidate prediction motion vectors are included in the original merge candidate prediction motion vector list. The prediction type of one candidate prediction motion vector is List 0 unidirectional prediction, and the prediction type of the other candidate prediction motion vector is List 1 unidirectional prediction. In this feasible implementation, mvL0_A and ref0 may be picked from list 0, and ref0 may be copied to the reference index ref0 ′ in list 1. Then, mvL0′_A may be calculated by scaling mvL0_A with ref0 and ref0 ′. The scaling may depend on the POC distance. Next, a bi-directional predictive merge candidate prediction motion vector (which has mvL0_A and ref0 in list 0 and mvL0'_A and ref0 'in list 1) can be generated and checked if it is a duplicate. If it is not duplicate, it can be added to the merge candidate prediction motion vector list.

FIG. 12 is an exemplary schematic diagram of adding a zero motion vector to a merge mode candidate prediction motion vector list in an embodiment of the present application. The zero vector merge candidate prediction motion vector may be generated by combining the zero vector with a reference index that can be referred to. If the zero vector candidate prediction motion vector is not duplicated, it can be added to the merge candidate prediction motion vector list. For each generated merge candidate prediction motion vector, the motion information may be compared with the motion information of the previous candidate prediction motion vector in the list.

In a feasible implementation manner, if the newly generated candidate prediction motion vector is different from the candidate prediction motion vector already included in the candidate prediction motion vector list, the generated candidate prediction motion vector is added to the merge candidate prediction motion vector. List. The process of determining whether the candidate prediction motion vector is different from the candidate prediction motion vector already included in the candidate prediction motion vector list is sometimes referred to as pruning. With pruning, each newly generated candidate prediction motion vector can be compared with existing candidate prediction motion vectors in the list. In some feasible implementations, the pruning operation may include comparing one or more new candidate prediction motion vectors with candidate prediction motion vectors already in the candidate prediction motion vector list and not adding as candidates already in the candidate prediction motion vector list. Repeated new candidate prediction motion vector for prediction motion vector. In other feasible implementations, the pruning operation may include adding one or more new candidate prediction motion vectors to a list of candidate prediction motion vectors and removing duplicate candidate prediction motion vectors from the list later.

In the various feasible implementation manners such as FIG. 10 to FIG. 12 described above, based on the technology of the present application, a newly generated candidate prediction motion vector may be used as a type of candidate motion vector, which is indicated by an identification bit in the original candidate prediction motion vector list Newly generated candidate prediction motion vector. When encoding, when the selected candidate motion vector is a newly generated candidate prediction motion vector, the code stream includes an identifier 1 indicating the category of the newly generated candidate prediction motion vector and the selected candidate motion vector is in the new Identification 2 of the position in the category of the generated candidate prediction motion vector. During decoding, the selected candidate motion vector is determined from the candidate prediction motion vector list according to the identifier 1 and the identifier 2 and a subsequent decoding process is performed.

In the various feasible implementation manners shown in FIG. 5 to FIG. 7, FIG. 9 to FIG. 12, and the like, the spatial candidate prediction mode is exemplified from five positions 252A to 252E shown in FIG. s position. Based on the foregoing various feasible implementation manners such as FIG. 5 to FIG. 7 and FIG. 9 to FIG. 12, in some feasible implementation manners, the spatial candidate prediction mode may further include, for example, a preset distance from the image block to be processed Within a distance, but not adjacent to the image block to be processed. Exemplarily, such positions may be shown as 252F to 252J in FIG. 13. It should be understood that FIG. 13 is an exemplary schematic diagram of a coding unit and an adjacent position image block associated with the coding unit in the embodiment of the present application. The positions described in the image blocks that are in the same image frame as the image block to be processed and that have been reconstructed when the image block to be processed is not adjacent to the image block to be processed are within the range of such positions.

This type of location may be referred to as a non-adjacent image block in the spatial domain, and the first non-adjacent image block, the second non-adjacent image block, and the third non-adjacent image block in the spatial domain may be available. The physical meaning of "available" can be referred As mentioned above, I will not repeat them here. At the same time, it may be set that when the spatial candidate prediction mode is taken from the position prediction mode shown in FIG. 8, the candidate prediction motion mode list is checked and constructed in the following order. It should be understood that the check includes the “available” mentioned above. The process of checking and trimming will not be repeated here. The candidate prediction mode list includes: a motion vector of a 252A position image block, a motion vector of a 252B position image block, a motion vector of a 252C position image block, a motion vector of a 252D position image block, and prediction from a selective time domain motion vector ( (ATMVP) technology, motion vector obtained from 252E position image block, motion vector obtained by spatio-temporal motion vector prediction (STMVP) technology. Among them, ATMVP technology and STMVP technology are detailed in JVET-G1001-v1 section 2.3.1.1 and 2.3.1.2. This article introduces JVET-G1001-v1 in its entirety, and will not repeat them here. It should be understood that, by way of example, the candidate prediction mode list includes the above 7 prediction motion vectors. According to different specific implementations, the number of prediction motion vectors included in the candidate prediction mode list may be less than 7, such as Take the first 5 to form the candidate prediction mode list, and also add the motion vectors constructed by the feasible embodiments described in FIGS. 10 to 12 to the candidate prediction mode list to make it contain more Predicted motion vector. In a feasible implementation manner, the first spatial domain non-adjacent image block, the second spatial domain non-adjacent image block, and the third spatial domain non-adjacent image block may be added to the candidate prediction mode list as image blocks to be processed. Predicted motion vector. Further, it may be desirable to set the motion vector of the 252A position image block, the 252B position image block, the 252C position image block, the 252D position image block, the motion vector obtained by ATMVP technology, and the 252E position image block. The motion vectors and motion vectors obtained by STMVP technology are MVL, MMU, MVUR, MVDL, MVA, MVUL, and MVS. Motion vectors are MV0, MV1, and MV2 respectively, you can check and build a candidate prediction motion vector list in the following order:

Example 1: MVL, MMU, MVUR, MVDL, MV0, MV1, MV2, MVA, MVUL, MVS;

Example 2: MVL, MMU, MVUR, MVDL, MVA, MV0, MV1, MV2, MVUL, MVS;

Example 3: MVL, MMU, MVUR, MVDL, MVA, MVUL, MV0, MV1, MV2, MVS;

Example 4: MVL, MMU, MVUR, MVDL, MVA, MVUL, MVS, MV0, MV1, MV2;

Example 5: MVL, MMU, MVUR, MVDL, MVA, MV0, MVUL, MV1, MVS, MV2;

Example 6: MVL, MMU, MVUR, MVDL, MVA, MV0, MVUL, MV1, MV2, MVS;

Example 7: MVL, MMU, MVUR, MVDL, MVA, MVUL, MV0, MV1, MV2, MVS;

It should be understood that the above list of candidate prediction motion vectors may be used in the Merge mode or AMVP mode described above, or other prediction modes that obtain the predicted motion vectors of the image block to be processed, may be used at the encoding end, or may be used with the corresponding The encoding end is used consistently at the decoding end, without limitation. At the same time, the number of candidate prediction motion vectors in the candidate prediction motion vector list is also preset, and is consistent at the encoding and decoding ends, and the specific number is not limited.

It should be understood that, examples 1 to 7 give examples of the composition of several feasible candidate prediction motion vector lists. Based on the motion vectors of non-contiguous image blocks in the spatial domain, there may be other composition methods of candidate prediction motion vector lists. The arrangement of candidate prediction motion vectors in the list is not limited.

This embodiment of the present application provides another method for constructing candidate prediction motion vector lists. Compared with the methods of candidate prediction motion vector lists such as Examples 1 to 7, this embodiment will determine candidate prediction motion vectors in other embodiments. Combined with the preset vector difference, a new candidate prediction motion vector is formed, which overcomes the shortcomings of low prediction accuracy of the prediction motion vector and improves coding efficiency.

In a feasible implementation manner of the present application, as shown in FIG. 14A, the candidate prediction motion vector list of the image block to be processed includes two sub-lists: a first motion vector set and a vector difference set. For the composition of the first motion vector set, reference may be made to various configurations in the foregoing embodiments of the present invention. For example, for example, the configuration manner of the candidate motion vector set in the Merge mode or the AMVP mode specified in the H.265 standard. The vector difference set includes one or more preset vector differences.

In some feasible implementation manners, each vector difference in the vector difference set is added to the original target motion vector determined from the first motion vector set, and the added vector difference and the original target motion vector become a new one. Sport vector collection.

In some feasible implementation manners, the candidate prediction motion vector list shown in FIG. 14A may include the vector difference set as a subset in the candidate prediction motion vector list, and the candidate prediction motion vector list is calculated by an identification bit (vector difference set calculation). MV) to indicate the vector difference set, and each vector difference is indicated by an index in the vector difference set, and a candidate prediction motion vector list constructed is shown in FIG. 14B.

It should be understood that the manner in which a class of candidate motion vectors is indicated in the predicted motion vector list provided by the technology of the present application can be used in the Merge mode or AMVP mode described above, or other predicted motion vectors of image blocks to be processed. In the prediction mode, it can be used at the encoding end, or can be used at the decoding end consistent with the corresponding encoding end, without limitation. At the same time, the number of candidate prediction motion vectors in the candidate prediction motion vector list is also preset, and The encoding and decoding ends are consistent, and the specific number is not limited.

Hereinafter, the method for decoding the predicted motion information provided in the embodiments of the present application will be described in detail with reference to the accompanying drawings. According to the technology of the embodiment of the present application, when the encoding end or the decoding end constructs a candidate motion information list, a type of candidate motion information is indicated in the list to control the length of the list, and the prediction motion information provided by the embodiment of the present application is decoded The method is developed based on this. The method is executed by a decoding device, which may be the video decoder 200 in the video decoding system 1 shown in FIG. 1, or may be a functional unit in the video decoder 200. This application This is not specifically limited.

FIG. 15 is a schematic flowchart of an embodiment of the present application, which relates to a decoding method for predicting motion information, and specifically may include:

S1501. The decoding device parses the code stream to obtain a first identifier.

As described above, the code stream is sent by the encoding end after encoding the current image block. The first identifier indicates the position of the selected candidate motion information when the encoding end encodes the current image block. The first identifier is used by the decoding device to determine the The selected candidate motion information further predicts the motion information of the image block to be processed.

In a possible implementation, the first identifier may be a specific index of the selected candidate motion information. In this case, the first identifier may uniquely determine one candidate motion information.

In a possible implementation, the first identifier may be an identifier of a category to which the selected candidate motion information belongs. In this case, the code stream further includes a fourth identifier to indicate that the selected candidate motion information is in its own category. Specific location.

It should be noted that the specific implementation of obtaining the identifier by parsing the code stream is not specifically limited in this application, and the positions and forms of the first identifier and the fourth identifier in the code stream are not specifically limited in the embodiments of the present application.

Optionally, the first identifier may be a fixed-length encoding method. For example, the first identifier may be a 1-bit identifier, and the types of indication are limited.

Optionally, the first identifier may adopt a variable length encoding method.

S1502. The decoding device determines a target element from the first candidate set according to the first identifier.

Specifically, the content of the first candidate set may include the following two possible implementations:

Possible implementations 1. Elements in the first candidate set include at least one first candidate motion information and at least one second candidate set, and elements in the second candidate set include multiple second candidate motion information.

Possible implementation 2. Elements in the first candidate set may include at least one first candidate motion information and a plurality of second candidate motion information, the first candidate motion information includes the first motion information, and the second candidate motion information includes a preset Motion information offset. New motion information may be generated according to the first motion information and a preset motion information offset.

The first candidate set may be a constructed candidate motion information list. In the first candidate set, at least one first candidate motion information is directly included, and a plurality of second candidate motion information is included in the first candidate set in the form of a second candidate set.

In a feasible implementation manner, the second candidate motion information and the first candidate motion information are different.

Exemplarily, the first candidate motion information and the second candidate motion information included in each second candidate set may be candidate motion information determined by using different MV prediction modes, or may be different types of candidate motion information. This embodiment of the present application does not specifically limit this.

For example, the first candidate motion information may be motion information acquired in a Merge manner, and the second candidate motion information may be motion information acquired in an Affine Merge manner.

For example, the first candidate motion information may be original candidate motion information, and the second candidate motion information may be motion information generated according to the original candidate motion information.

Exemplarily, as shown in FIG. 16A and FIG. 16B, two types of Merge candidate lists are illustrated. In the Merge candidate list shown in FIG. 16A or FIG. 16B, an identification bit in the list is used to indicate a candidate motion information set. The identification bit can be located at any position in the list, which is not specifically limited in the embodiment of the present application. For example, the identification bit may be located at the end of the list as shown in FIG. 16A; or, the identification bit may be located at the middle of the list as shown in FIG. 16B. When the first identifier in the code stream indicates the identifier bit, it is determined that the target element is a candidate motion information set indicated by the identifier bit. The candidate motion information set indicated by the identification bit includes a plurality of second candidate motion information. For the candidate motion information set pointed to by the identification bit, one of the candidate motion information is selected as the target motion information according to the further identification (the second identification in S1504), and used to predict the motion information of the image block to be processed.

Exemplarily, as shown in FIG. 16A and FIG. 16B, two types of Merge candidate lists are illustrated. In the Merge candidate list shown in FIG. 16A or FIG. 16B, an identification bit in the list is used to indicate a candidate motion information set. The identification bit can be located at any position in the list, which is not specifically limited in the embodiment of the present application. For example, the identification bit may be located at the end of the list as shown in FIG. 16A; or, the identification bit may be located at the middle of the list as shown in FIG. 16B. When the first identifier in the code stream indicates the identifier bit, it is determined that the target element is a plurality of second candidate motion information indicated by the identifier bit. The second candidate motion information includes a preset motion information offset. Then, among the plurality of second candidate motion information pointed by the identification bit, one of the candidate motion information is selected according to the further identification (the second identification in S1504), and the target motion information is determined based on the selected second candidate motion information, To predict the motion information of the image block to be processed.

In another feasible implementation manner, as shown in FIG. 16C, in the Merge candidate list, more than one flag is added, and each flag points to a specific candidate motion information set or a plurality of preset motion information is included. Offset motion information. When the first identifier in the code stream indicates a certain identifier bit, it is determined that the target element is the candidate motion information in the candidate motion information set indicated by the identifier bit, or according to multiple candidate motion information indicated by the identifier bit (including One of the preset motion information offsets) to determine the target motion information.

Figures 16A, 16B, and 16C introduce the identification (pointer) method in the Merge list to implement the introduction of candidates as a subset. When multiple candidates are introduced, the length of the candidate list is greatly reduced, and the complexity of list reconstruction is reduced. Degree, which is helpful for simplifying the hardware implementation.

In a feasible implementation manner, the first candidate motion information may include motion information of spatially adjacent image blocks of the image block to be processed. It should be noted that the definition of the motion information of the adjacent image blocks in the spatial domain has been described in the foregoing, and is not repeated here.

In a feasible implementation manner, the second candidate motion information may include motion information of a spatial domain non-adjacent image block of the image block to be processed. It should be noted that the definition of motion information of non-adjacent image blocks in the spatial domain has been described in the foregoing, and is not repeated here.

The method for acquiring the first motion information may be selected according to actual requirements, which is not specifically limited in the embodiment of the present application. The value of the preset motion information offset used to obtain the second motion information may be a fixed value or a value selected from a set. For the content of the preset motion information offset in this embodiment of the present application, Neither the form nor the form is specifically limited.

In a feasible implementation manner, the first candidate motion information includes first motion information, at least one second candidate set is a plurality of second candidate sets, and the plurality of second candidate sets includes at least one third candidate set and at least one A fourth candidate set, the elements of the third candidate set include motion information of spatial non-adjacent image blocks of a plurality of image blocks to be processed, and the elements of the fourth candidate set include a plurality of motions based on the first motion information and a preset motion Information obtained from the information offset.

In a feasible implementation manner, the at least one second candidate set is a plurality of second candidate sets, and the plurality of second candidate sets includes at least one fifth candidate set and at least one sixth candidate set. The elements include motion information of spatially non-adjacent image blocks of a plurality of image blocks to be processed, and the elements in the sixth candidate set include a plurality of preset motion information offsets.

In a feasible implementation manner, among the at least one first candidate motion information, the coding codeword used to identify the first motion information is the shortest.

In a feasible implementation manner, the first motion information does not include motion information obtained according to the ATMVP mode.

As described in S1501, the first identifier may be an index in the first candidate set or an identifier classified in the motion information. According to specific content, S1502 may be implemented in the following two cases:

Case 1. The first identifier is an index in the first candidate set.

In a possible implementation, in case 1, the decoding device in S1502 may determine an element at a position indicated by the first identifier in the first candidate set as a target element. Since the first candidate set includes at least one first candidate motion information and at least one second candidate set, the target element determined according to the first identifier may be the first candidate motion information or a second candidate set, depending on The content arranged at the position indicated by the first identifier.

In another possible implementation, in case 1, the decoding device in S1502 may determine the element at the position indicated by the first identifier in the first candidate set as the target element. Since the first candidate set includes at least one first candidate motion information and a plurality of second candidate motion information, the target element determined according to the first identifier may be the first candidate motion information or may be based on a plurality of second candidate motion information. The information obtained depends on the content of the position arrangement indicated by the first identifier.

Case 2. The first identifier is an identifier of candidate motion information classification.

In case 2, the decoding device in S1502 determines the classification to which the target element belongs according to the first identifier. The decoding device then parses the bitstream to obtain a fourth identifier, the fourth identifier indicates a specific position of the target element in its classification, and uniquely determines the target element in its classification according to the fourth identifier. Specifically, if the first identifier indicates that the target element belongs to the classification of the first candidate motion information, one first candidate motion information is determined as the target element among the at least one first candidate motion information according to the fourth identifier. If the first identifier indicates that the target element belongs to a certain category of the second candidate motion information, a second candidate set or a second candidate motion information as the target element is determined according to the fourth identifier.

Exemplarily, it is assumed that the first candidate motion information is Merge motion information, the first candidate set includes two second candidate sets, and the second candidate motion information in one second candidate set is Affine Merge motion information of the first type. The second candidate motion information in another second candidate set is Affine Merge motion information of the second type. A configuration identifier of 0 indicates Merge motion information, and an identifier of 1 indicates Affine Merge motion information. If the decoding device obtains the first identifier obtained by analyzing the code stream in S1501, then the decoding device obtains the fourth identifier by analyzing the code stream in S1502. According to the fourth identification, one Merge motion information is determined as a target element among at least one Merge motion information in the first candidate set. If the decoding device obtains the first identifier obtained by analyzing the code stream in S1501, then the decoding device obtains the fourth identifier by analyzing the code stream in S1502. According to the fourth identification, one second candidate set is determined as the target element among the two second candidate sets.

Exemplarily, it is assumed that the first candidate motion information is Merge motion information, the first candidate set includes two second candidate sets, and the second candidate motion information in one second candidate set corresponds to the first type of Affine Merge motion information. A preset motion information offset, and the second candidate motion information in another second candidate set is a preset motion information offset of the second type of AffineMerge motion information. A configuration identifier of 0 indicates Merge motion information, and an identifier of 1 indicates Affine Merge motion information. If the decoding device obtains the first identifier obtained by analyzing the code stream in S1501, then the decoding device obtains the fourth identifier by analyzing the code stream in S1502. According to the fourth identification, one Merge motion information is determined as a target element among at least one Merge motion information in the first candidate set. If the decoding device obtains the first identifier obtained by analyzing the code stream in S1501, then the decoding device obtains the fourth identifier by analyzing the code stream in S1502. According to the fourth identification, a second candidate set is determined from the two second candidate sets, and a target element is determined based on one of the second candidate motion information in the determined second candidate set.

Optionally, in S1502, the decoding device determines that the target element is the first candidate motion information, and executes S1503; in S1502, the decoding device determines that the target element is the second candidate set, or is obtained according to multiple second candidate motion information. , Then execute S1504.

S1503. When the target element is the first candidate motion information, use the first candidate motion information as the target motion information.

The target motion information is used to predict the motion information of the image block to be processed.

Optionally, the target motion information is used to predict the motion information of the image block to be processed, which can be specifically implemented as: using the target motion information as the motion information of the image block to be processed; or, using the target motion information as the predicted motion of the image block to be processed information. In practical applications, a specific implementation of selecting target motion information to predict motion information of an image block to be processed may be selected according to actual requirements, which is not specifically limited here.

Further, the subsequent processing of the image blocks to be processed has been described in detail in the foregoing, and is not repeated here.

S1504. Parse the bitstream to obtain a second identifier, and determine the target motion information based on one of the plurality of second candidate motion information according to the second identifier.

Wherein, the code stream is parsed in S1504 to obtain a second identifier, and the target motion information is determined based on the second identifier based on one of the plurality of second candidate motion information, which may be specifically implemented as: analyzing the code stream to obtain the second identifier, according to The second identifier determines target motion information from a plurality of second candidate motion information.

It should be noted that the specific implementation of obtaining the identifier by analyzing the code stream is not specifically limited in this application, and the position and form of the second identifier in the code stream are not specifically limited in the embodiments of the present application.

Optionally, the second identifier may adopt a fixed-length encoding method. For example, the second identifier may be a 1-bit identifier, and the types of indication are limited.

Optionally, the second identifier may adopt a variable-length encoding method. For example, the second identifier may be a plurality of bit identifiers.

Optionally, according to the content of the second candidate motion information, in S1504, according to the second identifier, based on one of the plurality of second candidate motion information, determining the target motion information may be achieved by one of the following feasible implementation methods, but It is not limited to this.

In a feasible implementation manner, when the first candidate motion information includes the first motion information, the second candidate motion information includes the second motion information, and the second motion information is based on the first motion information and a preset motion information offset. Obtained, in this manner, the second identifier may be a specific position of the target motion information in the second candidate set, and the decoding device in S1504 determines the target motion information from a plurality of second candidate motion information according to the second identifier. It is implemented as: determining the second candidate motion information of the position indicated by the second identifier in the second candidate set as the target element as the target motion information.

In a feasible implementation manner, when the first candidate motion information includes the first motion information and the second candidate motion information includes a preset motion information offset, in this mode, the second identifier is that the target offset is between The specific position in the second candidate set. The decoding device in S1504 determines the target motion information from the plurality of second candidate motion information according to the second identifier, which may be specifically implemented as: biasing from a plurality of preset motion information according to the second identifier The target offset is determined in the shift amount; the target motion information is determined based on the first motion information and the target offset.

In a feasible implementation manner, when the first candidate motion information includes the first motion information and the second candidate motion information includes a preset motion information offset, the second candidate motion information is offset from a plurality of preset motion information according to the second identifier. Before the target offset is determined in the displacement, the method for decoding prediction motion information provided in the present application may further include: multiplying a plurality of preset motion information offsets and a preset coefficient to obtain a plurality of adjusted motions. Information offset; correspondingly, determining a target offset from a plurality of preset motion information offsets according to a second identifier, including: determining a target from a plurality of adjusted motion information offsets according to a second identifier Offset.

In a feasible implementation manner, when the first candidate motion information includes the first motion information and the second candidate motion information includes a preset motion information offset, in this manner, the decoding device in S1504 according to the second identifier, Determining the target motion information based on one of the plurality of second candidate motion information may be specifically implemented as follows: determining a motion information offset from a plurality of preset motion information offsets and multiplying the preset by a second identifier The coefficient is used as the target offset; the target motion information is determined based on the first motion information and the target offset.

It should be noted that the preset coefficient may be a fixed coefficient configured in the decoding device, or may be a coefficient carried in a code stream, which is not specifically limited in this embodiment of the present application.

Further optionally, when the preset coefficient is a coefficient carried in a code stream, the method for decoding prediction motion information provided in this application may further include S1505.

S1505. Parse the bitstream to obtain a third identifier.

The third identifier includes a preset coefficient.

Through the decoding method for predicting motion information provided in this application, the elements in the first candidate set include the first candidate motion information and at least one second candidate set, or the elements in the first candidate set include the first candidate motion information And multiple second candidate motion information. In this way, when more candidates are introduced in the structure of the multi-layer candidate set, a type of candidate motion information set can be added as an element to the first candidate set, compared to The candidate motion information is directly added to the first candidate set, and the length of the first candidate set is greatly selected. When the first candidate set is a candidate motion information list for inter prediction, even if more candidates are caused, the length of the candidate motion information list can be well controlled, which facilitates the detection process and hardware implementation.

Exemplarily, the following are specific implementations of the embodiments of the present application:

Example 1:

Let the maximum length of the Merge candidate list be 7, and the first index 0-6 indicates each candidate space in the Merge list. The candidate motion information corresponding to the first index 0-5 includes a motion vector and a reference image, and the first index 6 corresponds to new motion information generated based on the candidate motion information corresponding to the index 0 and a preset motion vector offset. Assume that the candidate motion information corresponding to the first index 0 is forward prediction, the motion vector is (2, -3), and the reference frame POC is 2. The preset motion vector offsets are (1, 0), (0, -1), (-1, 0), (0, 1). When the first index value obtained by parsing the bitstream is 6, it indicates that the motion information used by the current image block is new motion information generated based on the candidate motion information corresponding to index 0 and a preset motion vector offset, and then further decoded to obtain The second index value. When the second index value obtained by further decoding is 1, the motion information of the current image block is forward prediction, and the motion vector is (2, -3) + (0, -1) = (2, -4), reference The frame POC is 2.

Example 2:

Let the maximum length of the Merge candidate list be 7, and the first index 0-6 indicates each candidate space in the Merge list. The candidate motion information corresponding to the first index 0-5 includes a motion vector and a reference image, and the first index 6 corresponds to new motion information generated based on the candidate motion information corresponding to the first index 0 and a preset motion vector offset. Assume that the motion information of the candidate corresponding to the first index 0 is bidirectional prediction, the forward motion vector is (2, -3), the reference frame POC is 2, the backward motion vector is (-2, -1), and the reference frame POC Is 4. The preset motion vector offsets are (1,0), (0, -1), (-1, 0), (0, 1). When the first index value obtained by parsing the bitstream is 6, it indicates that the motion information used by the current image block is new motion information generated based on the candidate motion information corresponding to index 0 and a preset motion vector offset, and then further decoded to obtain The second index value. When the second index value obtained by further decoding is 0, the motion information of the current image block is bidirectional prediction, and when the current frame POC is 3, the forward and backward reference frames POC are non-unidirectional than the current frame POC. Then the forward motion vector is (2, -3) + (1,0) = (3, -3), the reference frame POC is 2, and the backward motion vector is (-2, -1)-(1,0) = (-3, -1), the reference frame POC is 4; when the current frame POC is 6, the forward and backward reference frames POC are now in the same direction as the current frame POC. Then the forward motion vector is (2, -3) + (1,0) = (3, -3), the reference frame POC is 2, and the backward motion vector is (-2, -1) + (1,0) = (-1, -1), the reference frame POC is 4.

Example 3:

Let the maximum length of the Merge candidate list be 7, and the first index 0-6 indicates each candidate space in the Merge list. The candidate motion information corresponding to the first index 0-5 includes a motion vector and a reference image. It is assumed that the candidate motion information indicated by the first index 0 is composed of sub-block motion information, and the candidate motion information corresponding to the first index 1 is not composed of sub-blocks. The motion information is composed of motion information and forward prediction, the motion vector is (2, -3), and the reference frame POC is 2. The first index 6 corresponds to the candidate motion information corresponding to the first index 1 and the preset motion vector bias. New motion information generated by shifting; preset motion vector offsets are (1, 0), (0, -1), (-1, 0), (0, 1). When the first index value obtained by parsing the bitstream is 6, it indicates that the motion information used by the current image block is new motion information generated based on the candidate motion information corresponding to the first index 1 and a preset motion vector offset, and then further decoded. To get the second index value. When the second index value obtained by further decoding is 1, the motion information of the current block is forward prediction, and the motion vector is (2, -3) + (0, -1) = (2, -4), the reference frame POC is 2.

Example 4:

Let the maximum length of the Merge candidate list be 7, and the first index 0-6 indicates each candidate space in the Merge list. The first index 6 indicates that the current block uses the motion information of the non-adjacent spatial candidate as the reference motion information of the current block. Let the size of the non-adjacent airspace candidate set be 4, the non-adjacent airspace candidate set puts the available non-adjacent airspace candidates into the set according to a preset detection order, and let the non-adjacent airspace candidate motion information in the set be as follows:

Second index 0: candidate 0: forward prediction, the motion vector is (2, -3), and the reference frame POC is 2.

Second index 1: Candidate 1: Forward prediction, the motion vector is (1, -3), and the reference frame POC is 4.

Second index 2: Candidate 2: Backward prediction, the motion vector is (2, -4), and the reference frame POC is 2.

Second index 3: Candidate 3: Bidirectional prediction, forward motion vector is (2, -3), reference frame POC is 2, backward motion vector is (2, -2), and reference frame POC is 4.

When the first index value obtained by decoding is 6, it indicates that the current block uses the motion information of the non-adjacent spatial candidate as the reference motion information of the current block, and then is further decoded to obtain the second index value. When the second index value obtained by further decoding is 1, the motion information of candidate 1 in the non-adjacent spatial domain candidate set is used as the motion information of the current block.

Example 5:

Let the maximum length of the Merge candidate list be 7, and the first index 0-6 indicates each candidate space in the Merge list. The candidate motion information corresponding to the first index 0 is forward prediction, the motion vector is (2, -3), and the reference frame POC is 2. The first index 6 indicates new motion information generated based on candidate motion information corresponding to the first index 0 or motion information using non-adjacent spatial domain candidates as reference motion information of the current block. Let the size of the non-adjacent airspace candidate set be 4, the non-adjacent airspace candidate set puts the available non-adjacent airspace candidates into the set according to a preset detection order, and let the non-non-adjacent airspace candidate motion information in the set be as follows:

Second index 0: candidate 0: forward prediction, the motion vector is (-5, -3), and the reference frame POC is 2.

According to the candidate motion information corresponding to the first index 0 and the preset motion vector offsets (1,0), (0, -1), (-1, 0), (0, 1), another 4 are obtained The candidates are as follows:

Second index 4: Candidate 4: Forward prediction, the motion vector is (2, -3) + (1,0), and the reference frame POC is 2.

Second index 5: Candidate 5: Forward prediction, the motion vector is (2, -3) + (0, -1), and the reference frame POC is 2.

Second index 6: candidate 6: forward prediction, the motion vector is (2, -3) + (-1,0), and the reference frame POC is 2.

Second index 7: candidate 7: forward prediction, the motion vector is (2, -3) + (0, 1), and the reference frame POC is 2.

When the first index value obtained by decoding is 6, it indicates that the current block uses new motion information generated based on candidate motion information corresponding to the first index 0 or uses non-adjacent spatial candidate motion information as reference motion information of the current block. Then it is further decoded to obtain a second index value. When the second index value obtained by further decoding is 0, the motion information of candidate 0 (forward prediction, motion vector is (-5, -3), and reference frame POC is 2) in the non-adjacent spatial candidate set. As the motion information of the current block. When the second index value obtained by further decoding is 5, the motion vector offset candidate 5 (forward prediction, the motion vector is (2, -3) + (0, -1), and the reference frame POC is 2) As the current block motion information.

Example 6:

Let the maximum length of the Merge candidate list be 7, and the first index 0-6 indicates each candidate space in the Merge list. The motion information of the candidate corresponding to the first index 0 is forward prediction, the motion vector is (2, -3), and the reference frame POC is 2. The first index 6 indicates that the motion information adopted by the current block is new motion information generated based on the candidate motion information corresponding to the first index 0. Offset according to a preset motion vector:

(1,0), (0, -1), (-1, 0), (0, 1);

(2,0), (0, -2), (-2,0), (0,2);

The second index value 0 indicates candidates with a spacing of 1, the 1 index indicates candidates with a spacing of 2, and the third index value indicates a candidate index of a motion vector offset. When the first index value obtained by decoding is 6, it indicates that the motion information used by the current block is new motion information generated based on the candidate motion information corresponding to the first index 0, and then further decoded to obtain a second index value. When the second index value obtained by further decoding and the third index value are 1, 3 respectively, an offset motion vector (-2, 0) with a pitch of 2 and an index of 2 is selected. Then the motion information of the current block is forward prediction, the motion vector is (2, -3) + (-2,0) = (0, -3), and the reference frame POC is 2.

Example 7:

Let the maximum length of the Merge candidate list be 7, and the first index 0-6 indicates each candidate space in the Merge list. The first index 6 indicates that one of the candidates of the motion information candidate set obtained by using AFFINE for the current block is the reference motion information. Suppose the AFFINE motion information candidate set includes 4 AFFINE motion information candidates:

Second index 0: AFFINE candidate 0;

Second index 1: AFFINE candidate 1;

Second index 2: AFFINE candidate 2;

Second index 3: AFFINE candidate 3;

When the first index value obtained by decoding is 6, it indicates that one of the candidates in the motion information candidate set obtained by AFFINE is the reference motion information, and then further decoded to obtain the second index value. When the second index value obtained by further decoding is 1, the motion information of the AFFINE candidate 1 is used as the motion information of the current block.

Example 8:

Let the maximum length of the Merge candidate list be 7, and the first index 0-6 indicates each candidate space in the Merge list. The first index 6 indicates that one of the candidates of the motion information candidate set obtained by using the neighboring space in the current block is the reference motion information. It is assumed that the neighboring spatial motion information candidate set includes four neighboring spatial motion information candidates:

Second index 0: neighboring spatial candidate 0;

Second index 1: adjacent airspace candidate 1;

Second index 2: adjacent airspace candidate 2;

Second index 3: adjacent airspace candidate 3;

When the first index value obtained by decoding is 6, it indicates that one of the candidates in the motion information candidate set obtained by using the neighboring space for the current block is the reference motion information, and then further decoded to obtain the second index value. When the second index value obtained by further decoding is 1, the motion information of the neighboring spatial domain candidate 1 is used as the motion information of the current block.

Example 9:

Let the maximum length of the Merge candidate list be 7, and the first index 0-6 indicates each candidate space in the Merge list. The first index 6 indicates that one candidate in the motion information candidate set obtained by using the neighboring time domain for the current block is reference motion information. Assume that the adjacent temporal motion information candidate set includes four adjacent temporal motion information candidates:

Second index 0: adjacent time domain candidate 0;

Second index 1: adjacent time domain candidate 1;

Second index 2: Adjacent time domain candidate 2;

Second index 3: adjacent time domain candidate 3;

When the first index value obtained by decoding is 6, it indicates that one of the candidates in the motion information candidate set obtained by the neighboring time domain is used as the reference motion information, and then further decoded to obtain the second index value. When the second index value obtained by further decoding is 1, the motion information of the neighboring time domain candidate 1 is used as the motion information of the current block.

Example 10:

Let the maximum length of the Merge candidate list be 7, and the first index 0-6 indicates each candidate space in the Merge list. The first index 6 indicates that the current block adopts one of the motion information candidate sets composed of the sub-block motion information as the reference motion information. It is assumed that the motion information candidate set composed of sub-block motion information includes AFFINE motion information candidates, ATMVP, and STMVP candidates:

Second index 0: AFFINE candidate;

Second index 1: ATMVP candidate;

Second index 2: STMVP candidates;

When the first index value obtained by decoding is 6, it indicates that the current block uses one candidate of the motion information candidate set composed of the sub-block motion information as the reference motion information, and then further decodes to obtain the second index value. When the second index value obtained by further decoding is 1, the motion information of the ATMVP candidate is used as the motion information of the current block.

Example 11:

In the Merge candidate space, spaces 0-5 in the list are motion information obtained by using Merge, and space 6 is a motion information candidate set obtained by AFFINE. Let the first index 0 indicate that the current block uses the motion information obtained by Merge as the reference motion information, and the first index 1 indicates that one of the candidates in the motion information candidate set obtained by the current block using AFFINE is the reference motion information. Suppose the AFFINE motion information candidate set includes 4 AFFINE motion information candidates:

Second index 0: AFFINE candidate 0;

Second index 1: AFFINE candidate 1;

Second index 2: AFFINE candidate 2;

Second index 3: AFFINE candidate 3;

In one case, when the first index value obtained by decoding is 1, it indicates that one of the candidates of the motion information candidate set obtained by AFFINE is the reference motion information, and then further decoded to obtain the second identification value. When the second identification value obtained by further decoding is 1, the motion information of the AFFINE candidate 1 is used as the motion information of the current block;

In one case, when the first index value obtained by decoding is 0, it indicates that the current block uses the motion information obtained by Merge as the reference motion information, and then is further decoded to obtain a fourth index. When the fourth index value obtained by further decoding is 2, the motion information of space 2 in the Merge candidate list is used as the motion information of the current block.

Example 12:

In the Merge candidate space, spaces 0-3 in the list are motion information obtained using Merge, space 4 is a motion information candidate set obtained using adjacent time domains, and space 5 is a motion information candidate set composed of sub-block motion information. Space 6 is the candidate set of motion information obtained by AFFINE. Let the first index 0 indicate that the current block uses the motion information obtained by Merge as the reference motion information, and the first index 1 indicates that one of the candidates in the motion information candidate set obtained by the current block using AFFINE is the reference motion information, and the first index 01 Indicates that one of the candidates of the motion information candidate set obtained by using the adjacent time domain to the current block is reference motion information; the first index 11 indicates that the current block uses one of the motion information candidate sets composed of the sub-block motion information; Candidates are reference motion information.

Suppose the AFFINE motion information candidate set includes 4 AFFINE motion information candidates:

Second identification 0: AFFINE candidate 0;

Second identification 1: AFFINE candidate 1;

Second identifier 2: AFFINE candidate 2;

Second identification 3: AFFINE candidate 3;

Assume that the adjacent temporal motion information candidate set includes four adjacent temporal motion information candidates:

Second index 0: adjacent time domain candidate 0;

Second index 1: adjacent time domain candidate 1;

Second index 2: Adjacent time domain candidate 2;

Second index 3: adjacent time domain candidate 3;

It is assumed that the motion information candidate set composed of sub-block motion information includes AFFINE motion information candidates, ATMVP, and STMVP candidates:

Second index 0: AFFINE candidate;

Second index 1: ATMVP candidate;

Second index 2: STMVP candidates;

In one case, when the first index value obtained by decoding is 1, it indicates that one of the candidates of the motion information candidate set obtained by AFFINE is the reference motion information, and then further decoded to obtain the second identification value. When the second identification value obtained by further decoding is 1, the motion information of the AFFINE candidate 1 is used as the motion information of the current block.

In one case, when the first index value obtained by decoding is 01, it indicates that one of the candidates in the motion information candidate set obtained by the neighboring time domain is used as the reference motion information, and then further decoded to obtain a second identification value. When the second identification value obtained by further decoding is 2, the motion information of the neighboring time-domain candidate 2 is used as the motion information of the current block.

In one case, when the first index value obtained by decoding is 11, it indicates that the current block uses one candidate of the motion information candidate set composed of the sub-block motion information as reference motion information, and then further decodes to obtain the second index value. When the second index value obtained by further decoding is 1, the motion information of the ATMVP candidate is used as the motion information of the current block.

An embodiment of the present application provides a decoding device for predicting motion information. The device may be a video decoder, a video encoder, or a decoder. Specifically, the decoding apparatus for predicting motion information is configured to perform the steps performed by the decoding apparatus in the decoding method for predicting motion information. The decoding apparatus for predicting motion information provided in the embodiment of the present application may include a module corresponding to a corresponding step.

In the embodiment of the present application, the functional modules of the prediction motion information decoding device may be divided according to the foregoing method example. For example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. . The above integrated modules may be implemented in the form of hardware or software functional modules. The division of the modules in the embodiments of the present application is schematic, and is only a logical function division. In actual implementation, there may be another division manner.

In a case where each functional module is divided corresponding to each function, FIG. 17 illustrates a possible structural diagram of a decoding apparatus for predicting motion information involved in the foregoing embodiment. As shown in FIG. 9, the decoding apparatus 1700 for predicting motion information may include an analysis module 1701, a determination module 1702, and an assignment module 1703. Specifically, the functions of each module are as follows:

The analysis module 1701 is configured to parse a code stream to obtain a first identifier.

A determining module 1702 is configured to determine a target element from a first candidate set according to a first identifier, and the elements in the first candidate set include at least one first candidate motion information and at least one second candidate set. The element includes a plurality of second candidate motion information, or the first candidate motion information includes the first motion information, and the second candidate motion information includes a preset motion information offset.

The assignment module 1703 is configured to use the first candidate motion information as the target motion information when the target element is the first candidate motion information, and the target motion information is used to predict the motion information of the image block to be processed.

The analysis module 1701 is further configured to parse a code stream to obtain a second identifier when the target element is the second candidate set. According to the second identifier, the determination module 1702 is further configured to determine target motion information from multiple second candidate motion information. . Alternatively, the parsing module 1701 is configured to parse the code stream to obtain a second identifier when the target element is obtained according to multiple second candidate motion information, and determine the target motion based on one of the multiple second candidate motion information according to the second identifier. information.

The analysis module 1701 is configured to support the decoding device 1700 for predicting motion information to perform S1501, S1505, and the like in the above embodiments, and / or other processes used in the technology described herein. The determining module 1702 is configured to support the decoding apparatus 1700 for predicting motion information to perform S1502 and the like in the above embodiments, and / or other processes used in the technology described herein. The assignment module 1703 is configured to support the decoding device 1700 for predicting motion information to perform S1502 and the like in the above embodiments, and / or other processes used in the technology described herein.

In a feasible implementation manner, when the target element determined by the determination module 1702 is a second candidate set, or when the target element is obtained according to the plurality of second candidate motion information, the analysis module 1701 is further configured to: Code stream to obtain a third identifier, and the third identifier includes a preset coefficient.

Further, as shown in FIG. 17, the decoding apparatus 1700 for predicting motion information may further include a calculation module 1704, configured to multiply the plurality of preset motion information offsets by the preset coefficients to obtain multiple Offset of the adjusted motion information. Correspondingly, the determination module 1702 is specifically configured to determine the target offset from the plurality of adjusted motion information offsets according to the second identifier.

Wherein, all relevant content of each step involved in the above method embodiment can be referred to the functional description of the corresponding functional module, which will not be repeated here.

Although specific aspects of the application have been described with respect to video encoder 100 and video decoder 200, it should be understood that the techniques of this application may be implemented by many other video encoding and / or encoding units, processors, processing units, such as encoders / decoders (CODEC) hardware-based coding unit and the like. Furthermore, it should be understood that the steps shown and described with respect to FIG. 17 are provided only as a possible implementation. That is, the steps shown in the feasible embodiment of FIG. 17 need not necessarily be performed in the order shown in FIG. 17, and fewer, additional, or alternative steps may be performed.

In the case of using an integrated unit, FIG. 18 is a schematic structural block diagram of a decoding device 1800 for predicting motion information in an embodiment of the present application. Specifically, the decoding device 1800 for predicting motion information includes: a processor 1801 and a memory 1802 coupled to the processor; the processor 1801 is configured to execute the embodiment shown in FIG. 17 and various feasible implementations.

Among them, the processing module 1801 may be a processor or a controller, for example, it may be a central processing unit (CPU), a general-purpose processor, a digital signal processor (DSP), an ASIC, an FPGA, or other programmable A logic device, a transistor logic device, a hardware component, or any combination thereof. It may implement or execute various exemplary logical blocks, modules, and circuits described in connection with the present disclosure. The processor may also be a combination that implements computing functions, such as a combination including one or more microprocessors, a combination of a DSP and a microprocessor, and so on. The storage module 102 may be a memory.

Wherein, all relevant content of each scenario involved in the foregoing method embodiment can be referred to the functional description of the corresponding functional module, which will not be repeated here.

The above-mentioned prediction motion information decoding device 1700 and the prediction motion information decoding device 1800 may both execute the above-mentioned prediction motion information decoding method shown in FIG. 15. The prediction motion information decoding device 1700 and the prediction motion information decoding device 1800 may specifically be Video decoding device or other equipment with video codec function. The decoding apparatus 1700 for predicting motion information and the decoding apparatus 1800 for predicting motion information may be used to perform image prediction in the decoding process.

An embodiment of the present application provides an inter prediction device. The inter prediction device may be a video decoder, a video encoder, or a decoder. Specifically, the inter prediction apparatus is configured to perform the steps performed by the inter prediction apparatus in the above inter prediction method. The inter prediction apparatus provided in the embodiment of the present application may include a module corresponding to a corresponding step.

In the embodiment of the present application, functional modules of the inter prediction device may be divided according to the foregoing method examples. For example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The above integrated modules may be implemented in the form of hardware or software functional modules. The division of the modules in the embodiments of the present application is schematic, and is only a logical function division. In actual implementation, there may be another division manner.

The present application also provides a terminal, which includes: one or more processors, a memory, and a communication interface. The memory and the communication interface are coupled to one or more processors; the memory is used to store computer program code, and the computer program code includes instructions. When the one or more processors execute the instructions, the terminal executes the predicted motion information in the embodiment of the present application. Decoding method.

The terminal here can be a video display device, a smart phone, a portable computer, and other devices that can process or play videos.

The present application also provides a video decoder including a non-volatile storage medium and a central processing unit. The non-volatile storage medium stores an executable program, and the central processing unit and the non-volatile storage unit The media is connected, and the executable program is executed to implement the decoding method of the predicted motion information in the embodiment of the present application.

The present application further provides a decoder, which includes a decoding apparatus for predicting motion information in the embodiment of the present application.

Another embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium includes one or more program codes, the one or more programs include instructions, and when a processor in a terminal executes the program code, At this time, the terminal executes the decoding method of the predicted motion information shown in FIG. 15.

In another embodiment of the present application, a computer program product is also provided. The computer program product includes computer-executable instructions stored in a computer-readable storage medium; at least one processor of the terminal may be obtained from a computer. The storage medium reads the computer execution instruction, and at least one processor executes the computer execution instruction to cause the terminal to execute a decoding method for predicting motion information as shown in FIG. 15.

In the above embodiments, all or part of them may be implemented by software, hardware, firmware, or any combination thereof. When implemented using a software program, it may appear in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions according to the embodiments of the present application are generated.

The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be from a website site, computer, server, or data center Transmission to another website site, computer, server or data center by wire (for example, coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (for example, infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, and the like that includes one or more available medium integration. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (Solid State Disk (SSD)), and the like.

In addition, it should be understood that depending on the feasible implementation, a particular action or event of any of the methods described herein may be performed in a different sequence, may be added, merged, or omitted together (e.g., not all described Actions or events are necessary for practical methods). Furthermore, in certain possible implementations, actions or events may be performed simultaneously, for example, via multi-threaded processing, interrupt processing, or multiple processors, rather than sequentially. In addition, although certain aspects of the present application are described as being performed by a single module or unit for clarity, it should be understood that the techniques of this application may be performed by a unit or combination of modules associated with a video decoder.

In one or more possible implementations, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over a computer-readable medium as one or more instructions or code, and executed by a hardware-based processing unit. The computer-readable medium may include a computer-readable storage medium or a communication medium, the computer-readable storage medium corresponding to a tangible medium such as a data storage medium, and the communication medium includes a computer program that facilitates, for example, transmission from one place to another according to a communication protocol Any media.

In this manner, computer-readable media may illustratively correspond to (1) non-transitory, tangible computer-readable storage media, or (2) a communication medium such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code, and / or data structures used to implement the techniques described in this application. The computer program product may include a computer-readable medium.

As a feasible implementation, without limitation, the computer-readable storage medium may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage devices, magnetic disk storage devices or other magnetic storage devices, flash memory, or may be used to store rendering instructions. Or any other medium in the form of a data structure with the desired code and accessible by a computer. Also, any connection is properly termed a computer-readable medium. For example, if a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technology such as infrared, radio, and microwave is used to transmit instructions from a website, server, or other remote source, then coaxial Cables, fiber optic cables, twisted pairs, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of media.

It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transient media, but are instead directed to non-transitory, tangible storage media. As used herein, magnetic disks and optical discs include compact discs (CDs), laser discs, optical discs, digital versatile discs (DVDs), flexible disks, and Blu-ray discs, where magnetic discs typically reproduce data magnetically, and optical discs pass lasers The data is reproduced optically. Combinations of the above should also be included within the scope of computer-readable media.

Can be processed by one or more of, for example, one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), or other equivalent integrated or discrete logic circuits To execute instructions. Thus, as used herein, the term "processor" may refer to any of the aforementioned structures or any other structure suitable for implementing the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and / or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.

The techniques of this application can be implemented in a wide variety of devices or devices, including wireless handsets, integrated circuits (ICs), or collections of ICs (eg, chipset). Various components, modules, or units are described in this application to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily need to be implemented by different hardware units. More specifically, as described above, various units may be combined in a codec hardware unit or by interoperable hardware units (including one or more processors as described above) combined with appropriate software and / or firmware To provide.

The above description is only an exemplary specific implementation of the present application, but the scope of protection of the present application is not limited to this. Any person skilled in the art can easily think of changes or changes within the technical scope disclosed in this application. Replacement shall be covered by the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Claims

A decoding method for predicting motion information, comprising:

Parse the code stream to obtain a first identifier;

Determining a target element from a first candidate set according to the first identifier, the elements in the first candidate set including at least one first candidate motion information and a plurality of second candidate motion information, the first candidate motion information Including first motion information, and the second candidate motion information includes a preset motion information offset;

When the target element is the first candidate motion information, use the first candidate motion information as target motion information, and the target motion information is used to predict motion information of an image block to be processed;

When the target element is obtained according to the plurality of second candidate motion information, parse the code stream to obtain a second identifier, and based on the second identifier, based on one of the plurality of second candidate motion information, Determining the target motion information.
The method according to claim 1, wherein the first candidate motion information includes motion information of a spatially adjacent image block of the image block to be processed.
The method according to claim 1 or 2, wherein the second candidate motion information is obtained based on the first motion information and a preset motion information offset.
The method according to claim 1 or 2, wherein determining the target motion information based on the second identifier based on one of the plurality of second candidate motion information includes:

Determining a target offset from a plurality of preset motion information offsets according to the second identifier;

The target motion information is determined based on the first motion information and the target offset.
The method according to any one of claims 1 to 4, characterized in that, among the at least one first candidate motion information, an encoding codeword for identifying the first motion information is shortest.
The method according to any one of claims 1 to 5, wherein when the target element is obtained according to the plurality of second candidate motion information, the method further comprises:

Parse the code stream to obtain a third identifier, where the third identifier includes a preset coefficient.
The method according to claim 6, wherein before the determining the target motion information based on the second identifier based on one of the plurality of second candidate motion information, the method further comprises:

Multiply a plurality of preset motion information offsets by a preset coefficient to obtain a plurality of adjusted motion information offsets.
The method according to any one of claims 1 to 7, wherein the target motion information is used to predict motion information of an image block to be processed, and includes:

Use the target motion information as the motion information of the image block to be processed; or use the target motion information as the predicted motion information of the image block to be processed.
The method according to any one of claims 1 to 8, wherein the second identifier adopts a fixed-length encoding method.
The method according to any one of claims 1 to 8, wherein the second identifier adopts a variable length coding method.
A decoding device for predicting motion information, comprising:

A parsing module, configured to parse a code stream to obtain a first identifier;

A determining module, configured to determine a target element from a first candidate set according to the first identifier, and the elements in the first candidate set include at least one first candidate motion information and a plurality of second candidate motion information; The first candidate motion information includes first motion information, and the second candidate motion information includes a preset motion information offset;

An assignment module, configured to use the first candidate motion information as target motion information when the target element is the first candidate motion information, and the target motion information is used to predict motion information of an image block to be processed;

The analysis module is further configured to: when the target element is obtained according to the plurality of second candidate motion information, parse the code stream to obtain a second identifier, and according to the second identifier, based on the plurality of first candidate motion information, One of the two candidate motion information determines the target motion information.
The apparatus according to claim 11, wherein the first candidate motion information includes motion information of a spatially adjacent image block of the image block to be processed.
The apparatus according to claim 11 or 12, wherein the second candidate motion information is obtained based on the first motion information and a preset motion information offset.
The apparatus according to claim 11 or 12, wherein the analysis module is specifically configured to:

Determining a target offset from a plurality of preset motion information offsets according to the second identifier;

The target motion information is determined based on the first motion information and the target offset.
The device according to any one of claims 11 to 14, wherein, among the at least one first candidate motion information, an encoding codeword for identifying the first motion information is shortest.
The apparatus according to any one of claims 11 to 15, wherein when the target element is obtained according to the plurality of second candidate motion information, the analysis module is further configured to:

Parse the code stream to obtain a third identifier, where the third identifier includes a preset coefficient.
The device according to claim 16, further comprising:

A calculation module is configured to multiply a plurality of preset motion information offsets by the preset coefficients to obtain a plurality of adjusted motion information offsets.
The device according to any one of claims 11 to 17, wherein the determining module is specifically configured to:

Use the target motion information as the motion information of the image block to be processed; or use the target motion information as the predicted motion information of the image block to be processed.
The device according to any one of claims 11 to 18, wherein the second identifier adopts a fixed-length encoding method.
The device according to any one of claims 11 to 18, wherein the second identifier adopts a variable length coding method.