WO2020024275A1

WO2020024275A1 - Inter-frame prediction method and device

Info

Publication number: WO2020024275A1
Application number: PCT/CN2018/098581
Authority: WO
Inventors: 陈旭; 郑建铧
Original assignee: 华为技术有限公司
Priority date: 2018-08-03
Filing date: 2018-08-03
Publication date: 2020-02-06

Abstract

A method for predicting motion information concerning an image block. Said method comprises: determining the availability of at least one target pixel point having a preset positional relationship with an image block to be processed, the target pixel point comprising candidate pixel points that are located at the left side of said image block and are not adjacent to said image block, and when a prediction mode of the image block where the target pixel point is located is intra-frame prediction, the target pixel point being unavailable; adding motion information corresponding to the available target pixel point into a candidate motion information set of said image block to be processed; acquiring target identification information, the target identification information being used to determine target motion information from the candidate motion information set; and predicting, according to the target motion information, the motion information concerning said image block.

Description

Method and device for inter prediction

Technical field

The present application relates to the technical field of video images, and in particular, to a method and a device for inter prediction.

Background technique

Digital video capabilities can be incorporated into a wide range of devices, including digital television, digital live broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, e-books Reader, digital camera, digital recording device, digital media player, video game device, video game console, cellular or satellite radio telephone, video conference device, video streaming device, etc. Digital video equipment implements video compression technologies, such as advanced video coding (AVC), ITU-TH.263, ITU-TH.264 / MPEG-4 Part 10. TH.265 high-efficiency video coding (HEVC) standard defines standards and those video compression technologies described in the extensions of the standard to more efficiently transmit and receive digital video information. Video devices can implement these video codec technologies to more efficiently transmit, receive, encode, decode, and / or store digital video information.

Video compression techniques perform spatial (intra-image) prediction and / or temporal (inter-image) prediction to reduce or remove redundancy inherent in video sequences. For block-based video decoding, a video block may be divided into video blocks, and a video block may also be referred to as a tree block, a coding unit (coding unit, CU), and / or a decoding node. Video blocks in an intra-coded (I) slice of an image are encoded using spatial prediction about reference samples in neighboring blocks in the same image. Video blocks in an inter-decoded (P or B) slice of an image may use spatial predictions about reference samples in neighboring blocks in the same image or temporal predictions about reference samples in other reference images. An image may be referred to as a frame, and a reference image may be referred to as a reference frame.

Summary of the invention

The embodiments of the present application provide a method and an apparatus for inter prediction. Selecting suitable candidate motion information as a motion information prediction value of an image block to be processed improves the effectiveness of motion information prediction and the encoding and decoding efficiency.

It should be understood that, generally, the motion information includes motion vectors and index information of a reference frame pointed to by the motion vectors, and the like. In a feasible implementation manner in the embodiment of the present application, the prediction of the motion information refers to the prediction of a motion vector.

In a first aspect of the embodiments of the present application, a method for predicting motion information of an image block is provided, including: obtaining at least two target pixel points having a preset position relationship with an image block to be processed, where the target pixel points include A first candidate pixel point adjacent to the image block to be processed and a second candidate pixel point located on the left side of the image block to be processed and not adjacent to the image block to be processed; obtaining target identification information, the target The identification information is used to determine target motion information from the motion information corresponding to the at least two target pixel points, and when the first candidate pixel point corresponds to the target motion information, a binary representation of the target identification information The length is N. When the second candidate pixel corresponds to the target motion information, the length of the binary representation of the target identification information is M, and N is less than or equal to M; according to the target motion information, predicting the Process the motion information of the image block.

The beneficial effect of this implementation mode is that motion information of a non-adjacent image block on the left side of a block to be processed is used as candidate motion information of the block to be processed, and more spatially prior coding information is used to improve coding performance.

In a feasible implementation manner, the binary representation of the target identification information includes an encoded codeword of the target identification information.

The beneficial effect of this implementation mode is that when the candidate prediction motion information is expressed in a variable-length encoding method, the motion information in the upper order will be encoded with a shorter codeword, and the motion information in the lower order will use a longer codeword coding. According to the correlation between the motion information of the target pixel and the motion information of the image block to be processed, the proper determination of the acquisition order of the target pixel is helpful for selecting a better codeword encoding strategy and improving the encoding performance.

In a feasible implementation manner, the position of the second candidate pixel point includes: using a position of a pixel point at an upper left vertex of the image block to be processed as an origin, and using an upper position of the image block to be processed The line where the edge is located is the horizontal axis, the right is the horizontal positive direction, the line where the left edge of the image block to be processed is located is the vertical axis, and the downward is the following coordinate points in the orthogonal coordinate system in the vertical positive direction At least one: (-1, h × i-1 + h), (-1, h × i + h), (-w × i, h × j-1), (-w × i-1, h × j-1), (-w × i, h × j), (-w × i-1, h × j), where w and h are preset positive integers, i is a positive integer, and j is a non-negative integer .

The beneficial effect of this implementation mode is that it provides multiple possibilities for the selection of the second candidate pixel point according to the actual coding requirements, and can achieve a balance between performance, complexity, and software and hardware consumption.

In a feasible implementation manner, w is the width of the image block to be processed, and h is the height of the image block to be processed.

The beneficial effect of this implementation mode is that the position of the second candidate pixel point is selected according to the size of the image block to be processed, which is consistent with the local motion characteristics of the image block to be processed, making the selection more reasonable.

In a feasible implementation manner, the motion vector field is obtained by sampling a motion information matrix corresponding to an image where the image block to be processed is located, w is a sampling width interval of the motion vector field, and h is the motion vector. The sampling height interval of the field.

The beneficial effect of this implementation mode is that the selection of the position of the second candidate pixel point and the distribution of the motion information of the motion vector field are kept consistent, and the balance of position selection is ensured.

In a feasible implementation manner, w × i is less than or equal to the first threshold.

In a feasible implementation manner, the first threshold value is equal to a width of a coding tree unit CTU where the image block to be processed is located, or the first threshold value is equal to twice the width of the CTU.

This embodiment has the beneficial effect of limiting the selection range of the position of the second candidate pixel point, and ensuring the balance between the coding performance and the storage space.

In a feasible implementation manner, there are a plurality of second candidate pixel points, and the acquiring at least two target pixel points having a preset position relationship with the image block to be processed includes: acquiring in accordance with the preset order A plurality of second candidate pixel points of the at least two target pixel points, wherein when the previously obtained second candidate pixel points correspond to the target motion information, the length of the binary representation of the target identification information Is P, when the second candidate pixel point obtained later corresponds to the target motion information, the length of the binary representation of the target identification information is Q, and P is less than or equal to Q.

In a feasible implementation manner, the preset order includes a short-to-long distance order, where the distance is an absolute value of a horizontal coordinate of the second candidate pixel point in the rectangular coordinate system and The sum of the absolute values of the vertical coordinates; or, the order from right to left; or, the order from top to bottom; or the polyline order from top to bottom.

In a feasible implementation manner, the distance is a length of a straight line segment connecting the second candidate pixel point and a pixel point at a vertex position of a lower left corner of the image block to be processed.

The beneficial effect of this implementation mode is that when the variable length coding method is adopted for the representation method of the motion information corresponding to each second candidate pixel point, the motion information in the upper order will be encoded with a shorter codeword, and the motion information in the lower order will be used. Longer codeword encoding. According to the correlation between the motion information of the second candidate pixel point and the motion information of the image block to be processed, appropriately determining the acquisition order is beneficial to selecting a better codeword coding strategy and improving coding performance.

In a feasible implementation manner, among the obtained at least two target pixel points, motion information of at least two target pixel points is the same.

The beneficial effect of this implementation mode is that the trimming operation is not performed when constructing the candidate motion information list, which saves complexity.

In a feasible implementation manner, the acquiring at least two target pixel points having a preset position relationship with an image block to be processed includes: sequentially obtaining candidates having the preset position relationship with the image block to be processed. A pixel point; determining that the currently acquired motion information of the candidate pixel point is different from the acquired motion information of the target pixel point; and using the candidate pixel point with different motion information as the target pixel point.

The beneficial effect of this implementation mode is that redundant information in the candidate motion information list is removed through a pruning operation, and encoding efficiency is improved.

In a feasible implementation manner, the number of the obtained target pixel points is a preset second threshold.

The beneficial effect of this implementation mode is that the number of target pixel points is acquired, encoding performance and software and hardware consumption are balanced, and in some specific implementation modes, the instability of the decoding system caused by the uncertain total number of candidate motion information lists is also avoided.

In a feasible implementation manner, the predicting the motion information of the image block to be processed according to the target motion information includes: using the target motion information as the motion information of the image block to be processed.

In a feasible implementation manner, the method is used to decode the image block to be processed, further comprising: analyzing a code stream to obtain target motion residual information; and correspondingly, predicting the target motion information based on the target motion information. The motion information of the image block to be processed includes: combining the target motion information and the target motion residual information to obtain the motion information of the image block to be processed.

In a feasible implementation manner, the obtaining target identification information includes: parsing the code stream to obtain the target identification information.

In a feasible implementation manner, the method is used to encode the image block to be processed, and before the obtaining the target identification information, the method further includes: determining a combination of target motion information and target motion residual information with the least coding cost. Correspondingly, the obtaining target identification information includes: obtaining identification information of the target motion information with the least coding cost among the at least two target motion information.

In a feasible implementation manner, the obtained target identification information is encoded.

In a feasible implementation manner, the target motion residual information is encoded.

The above various feasible embodiments apply the motion vector prediction method in the present application to a decoding method and an encoding method for obtaining motion vectors of an image block to be processed, a merge prediction mode (Merge) and an advanced motion vector prediction mode (advanced motion vector prediction). AMVP), which improves the coding performance and efficiency of the original method.

In a second aspect of the embodiments of the present application, a method for predicting motion information of an image block is provided, including: determining availability of at least one target pixel point having a preset position relationship with an image block to be processed, the target pixel point Including candidate pixel points located on the left side of the image block to be processed and not adjacent to the image block to be processed, wherein when the prediction mode of the image block where the target pixel point is located is intra prediction, the target Pixels are unavailable; motion information corresponding to the available target pixel is added to the candidate motion information set of the image block to be processed; target identification information is obtained, and the target identification information is used from the candidate motion information set Determining target motion information; and predicting motion information of the image block to be processed according to the target motion information.

In a feasible implementation manner, the determining the availability of at least one target pixel point having a preset position relationship with the image block to be processed includes determining the availability of the image block where the target pixel point is located.

It should be understood that the judgment of availability will be based on factors such as the prediction mode of the image block where the target pixel is located, whether the target pixel is within the image region, whether the motion vector corresponding to the position indicated by the target pixel is necessarily the same as the motion vector corresponding to other positions ( For example, in the H.265 standard, for the rectangular block mode, the candidate prediction block of the Merge mode is determined). In a feasible implementation manner, generally, when the prediction mode of the image block where the target pixel point is located is inter prediction, the target pixel point is available, but when the target pixel point is located at the image block to be processed When the image is outside the edge of the image or the edge of the strip, the target pixel is also unavailable.

In a feasible implementation manner, the position of the candidate pixel point includes: taking a position of a pixel point at an upper left vertex of the image block to be processed as an origin point, and taking an upper edge of the image block to be processed to be located The straight line is the horizontal axis, the right is the horizontal positive direction, the straight line where the left edge of the image block to be processed is located is the vertical axis, and the downward is at least one of the following coordinate points in the orthogonal coordinate system in the vertical positive direction. : (-1, h × i-1 + h), (-1, h × i + h), (-w × i, h × j-1), (-w × i-1, h × j- 1), (-w × i, h × j), (-w × i-1, h × j), where w and h are preset positive integers, i is a positive integer, and j is a non-negative integer.

The beneficial effect of this implementation mode is that it provides multiple possibilities for the selection of candidate pixels according to actual coding requirements, and can achieve a balance between performance, complexity, and software and hardware consumption.

The beneficial effect of this implementation mode is that the positions of candidate pixel points are selected according to the size of the image block to be processed, which conforms to the local motion characteristics of the image block to be processed, making the selection more reasonable.

The beneficial effect of this implementation mode is that the selection of the positions of the candidate pixels is consistent with the distribution of the motion information of the motion vector field, and the balance of position selection is ensured.

The beneficial effect of this implementation mode is to limit the selection range of candidate pixel positions, and ensure the balance between encoding performance and storage space.

In a feasible implementation manner, there are multiple candidate pixels and the multiple candidate pixels are available, and the available motion information corresponding to the target pixel is added to the image block to be processed. The candidate motion information set includes: adding motion information corresponding to a plurality of available candidate pixel points to the candidate motion information set of the image block to be processed according to a preset order, wherein when the previously obtained candidate pixel points When corresponding to the target motion information, the length of the binary representation of the target identification information is P. When the candidate pixel points obtained later correspond to the target motion information, the length of the binary representation of the target identification information is Q, P is less than or equal to Q.

In a feasible implementation manner, the preset order includes a short-to-long distance order, wherein the distance is a horizontal coordinate absolute value and a vertical value of the candidate pixel point in the rectangular coordinate system. The sum of the absolute values of the coordinates; or, the order from right to left; or, the order from top to bottom; or the polyline order from top to bottom.

The beneficial effect of this implementation mode is that when the variable-length coding method is used for the representation of the motion information corresponding to each candidate pixel point, the motion information in the upper order will be encoded with a shorter codeword, and the motion information in the lower order will be used for a longer time. Codeword encoding. According to the correlation between the motion information of the candidate pixels and the motion information of the image block to be processed, the proper determination of the acquisition order is helpful for selecting a better codeword encoding strategy and improving the encoding performance.

In a feasible implementation manner, the candidate motion information set includes at least two identical motion information.

In a feasible implementation manner, adding the available motion information corresponding to the target pixel point to the candidate motion information set of the image block to be processed includes: sequentially obtaining the available target pixel point; determining a current The obtained motion information of the available target pixel points is different from the motion information in the candidate motion information set of the image block to be processed; adding the available target pixel points with different motion information to the image block to be processed Candidate motion information set.

In a feasible implementation manner, the number of motion information in the candidate motion information set is less than or equal to a preset second threshold.

In a feasible implementation manner, the method further includes: encoding the obtained target identification information.

In a feasible implementation manner, the method further includes: encoding the target motion residual information.

The above various feasible embodiments apply the motion vector prediction method in the present application to a decoding method and an encoding method for obtaining motion vectors of an image block to be processed, and the combined prediction mode and advanced motion vector prediction mode improve the encoding performance of the original method. And efficiency.

In a third aspect of the embodiments of the present application, a device for predicting motion information is provided, including: an acquisition module, configured to acquire at least two target pixel points having a preset position relationship with an image block to be processed, the target pixels The points include a first candidate pixel point adjacent to the image block to be processed and a second candidate pixel point located on the left side of the image block to be processed and not adjacent to the image block to be processed; an index module, configured to obtain Target identification information for determining target motion information from motion information corresponding to the at least two target pixel points, wherein when the first candidate pixel point corresponds to the target motion information, the The length of the binary representation of the target identification information is N. When the second candidate pixel corresponds to the target motion information, the length of the binary representation of the target identification information is M, and N is less than or equal to M; the calculation module uses Based on the target motion information, predicting motion information of the image block to be processed.

In a feasible implementation manner, there are multiple second candidate pixels, and the obtaining module is specifically configured to obtain multiple second candidates among the at least two target pixels in the preset order. A pixel, wherein when the previously obtained second candidate pixel corresponds to the target motion information, the length of the binary representation of the target identification information is P, and when the later obtained second candidate pixel When corresponding to the target motion information, the length of the binary representation of the target identification information is Q, and P is less than or equal to Q;

In a feasible implementation manner, the obtaining module is specifically configured to sequentially obtain candidate pixel points having the preset positional relationship with the image block to be processed; and determine the currently acquired motion information of the candidate pixel points. Different from the obtained motion information of the target pixel point; the candidate pixel point having different motion information is used as the target pixel point.

In a feasible implementation manner, the calculation module is specifically configured to use the target motion information as the motion information of the image block to be processed.

In a feasible implementation manner, the device is configured to decode the image block to be processed, and the indexing module is further configured to: parse a code stream to obtain target motion residual information; correspondingly, the calculation module is specifically used It is: combining the target motion information and the target motion residual information to obtain motion information of the image block to be processed.

In a feasible implementation manner, the indexing module is specifically configured to: parse the code stream to obtain the target identification information.

In a feasible implementation manner, the device is configured to encode the image block to be processed, and the obtaining module is further configured to determine a combination of target motion information and target motion residual information with the least coding cost; correspondingly, The indexing module is specifically configured to obtain identification information of the target motion information with the least coding cost among the at least two target motion information.

In a feasible implementation manner, the indexing module is further configured to: encode the obtained target identification information.

In a feasible implementation manner, the indexing module is further configured to: encode the target motion residual information.

In a fourth aspect of the embodiments of the present application, a device for predicting motion information is provided, including: a detection module configured to determine availability of at least one target pixel point having a preset position relationship with an image block to be processed, the target The pixel points include candidate pixel points that are located on the left side of the image block to be processed and are not adjacent to the image block to be processed. When the prediction mode of the image block where the target pixel point is located is intra prediction, all pixels The target pixel is unavailable; an acquisition module is configured to add the available motion information corresponding to the target pixel to the candidate motion information set of the image block to be processed; an index module is configured to acquire target identification information, the target The identification information is used to determine target motion information from the candidate motion information set; a calculation module is configured to predict motion information of the image block to be processed according to the target motion information.

In a feasible implementation manner, the detection module is specifically configured to determine availability of an image block where the target pixel point is located.

In a feasible implementation manner, the position of the candidate pixel point includes: taking a position of a pixel point at an upper left vertex of the image block to be processed as an origin point, and a position where an upper edge of the image block to be processed is located The straight line is the horizontal axis, the right is the horizontal positive direction, the straight line where the left edge of the image block to be processed is located is the vertical axis, and the downward is at least one of the following coordinate points in the orthogonal coordinate system in the vertical positive direction : (-1, h × i-1 + h), (-1, h × i + h), (-w × i, h × j-1), (-w × i-1, h × j- 1), (-w × i, h × j), (-w × i-1, h × j), where w and h are preset positive integers, i is a positive integer, and j is a non-negative integer.

In a feasible implementation manner, there are a plurality of candidate pixel points, and the plurality of candidate pixel points are available, and the obtaining module is specifically configured to: according to a preset order, use the plurality of candidate pixel points that are available. The corresponding motion information is added to the candidate motion information set of the image block to be processed, wherein when the previously obtained candidate pixel points correspond to the target motion information, the length of the binary representation of the target identification information is P, When the candidate pixel points obtained later correspond to the target motion information, the length of the binary representation of the target identification information is Q, and P is less than or equal to Q.

In a feasible implementation manner, the preset order includes a short-to-long distance order, where the distance is a horizontal coordinate absolute value and a vertical value of a second candidate pixel point in the rectangular coordinate system. The sum of the absolute values of the coordinates; or, the order from right to left; or, the order from top to bottom; or the polyline order from top to bottom.

In a feasible implementation manner, the acquiring module is specifically configured to: sequentially acquire the available target pixel points; determine the currently acquired motion information of the available target pixel points and candidates for the image block to be processed The motion information in the motion information set is different; the available target pixels with different motion information are added to the candidate motion information set of the image block to be processed.

In a fifth aspect of the embodiments of the present application, there is provided a prediction device that provides motion information, including: a processor and a memory coupled to the processor; and the processor is configured to execute the first or second aspect described above. The method described.

In a sixth aspect of the embodiments of the present application, a computer-readable storage medium is provided, where the computer-readable storage medium stores instructions, and when the instructions are run on a computer, the computer is caused to execute the foregoing first aspect or The method described in the second aspect.

In a seventh aspect of the embodiments of the present application, a computer program product containing instructions is provided, and when the instructions are run on a computer, the computer is caused to execute the method described in the first aspect or the second aspect above.

It should be understood that the third to seventh aspects of this application are consistent with the technical solutions of the first aspect or the second aspect of this application, and the beneficial effects obtained by each aspect and the corresponding implementable design manner are similar, and will not be described again.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic block diagram of a video encoding and decoding system according to an embodiment of the present application;

2 is a schematic block diagram of a video encoder according to an embodiment of the present application;

3 is a schematic block diagram of a video decoder according to an embodiment of the present application;

4 is a schematic block diagram of an inter prediction module according to an embodiment of the present application;

5 is an exemplary flowchart of a merge prediction mode according to an embodiment of the present application;

FIG. 6 is an exemplary flowchart of an advanced motion vector prediction mode according to an embodiment of the present application; FIG.

7 is an exemplary flowchart of motion compensation performed by a video decoder in an embodiment of the present application;

8 is an exemplary schematic diagram of a coding unit and an adjacent position image block associated with the coding unit in the embodiment of the present application;

FIG. 9 is an exemplary flowchart of constructing a candidate prediction motion vector list in an embodiment of the present application; FIG.

10 is an exemplary schematic diagram of adding a combined candidate motion vector to a merge mode candidate prediction motion vector list in an embodiment of the present application;

11 is an exemplary schematic diagram of adding a scaled candidate motion vector to a merge mode candidate prediction motion vector list in an embodiment of the present application;

12 is an exemplary schematic diagram of adding a zero motion vector to a merge mode candidate prediction motion vector list in an embodiment of the present application;

13 is another exemplary schematic diagram of a coding unit and an adjacent position image block associated with the coding unit in the embodiment of the present application;

FIG. 14 is an exemplary flowchart of a method for predicting motion information according to an embodiment of the present application; FIG.

15 is an exemplary schematic diagram of an image block to be processed and an image block of an adjacent position associated with the image block to be processed in the embodiment of the present application;

FIG. 16 is an exemplary schematic diagram of an acquisition sequence from right to left in an embodiment of the present application; FIG.

FIG. 17 is an exemplary schematic diagram of an acquisition sequence from top to bottom in an embodiment of the present application; FIG.

FIG. 18 is an exemplary schematic diagram of an obtaining sequence from the upper right to the lower left in the embodiment of the present application; FIG.

19 to 30 are exemplary schematic diagrams of different acquisition sequences in the embodiment of the present application;

FIG. 31 is another exemplary flowchart of a motion information prediction method according to an embodiment of the present application; FIG.

FIG. 32 is a block diagram of an exemplary structure of a motion information prediction apparatus according to an embodiment of the present application; FIG.

FIG. 33 is another exemplary structural block diagram of a motion information prediction apparatus according to an embodiment of the present application; FIG.

FIG. 34 is a schematic structural block diagram of a motion information prediction device in an embodiment of the present application.

detailed description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

FIG. 1 is a block diagram of a video decoding system 1 according to an example described in the embodiment of the present application. As used herein, the term "video coder" generally refers to both video encoders and video decoders. In this application, the terms "video coding" or "coding" may generally refer to video encoding or video decoding. The video encoder 100 and the video decoder 200 of the video decoding system 1 are configured to predict a current coded image block according to various method examples described in any of a variety of new inter prediction modes proposed in the present application. The motion information of the sub-block or its sub-blocks, such as the motion vector, makes the predicted motion vector close to the motion vector obtained using the motion estimation method to the greatest extent, so that the motion vector difference is not transmitted during encoding, thereby further improving the encoding and decoding performance.

As shown in FIG. 1, the video decoding system 1 includes a source device 10 and a destination device 20. The source device 10 generates encoded video data. Therefore, the source device 10 may be referred to as a video encoding device. The destination device 20 may decode the encoded video data generated by the source device 10. Therefore, the destination device 20 may be referred to as a video decoding device. Various implementations of the source device 10, the destination device 20, or both may include one or more processors and a memory coupled to the one or more processors. The memory may include, but is not limited to, RAM, ROM, EEPROM, flash memory, or any other media that can be used to store the desired program code in the form of instructions or data structures accessible by a computer, as described herein.

The source device 10 and the destination device 20 may include various devices including desktop computers, mobile computing devices, notebook (e.g., laptop) computers, tablet computers, set-top boxes, telephone handsets, such as so-called "smart" phones, etc. Cameras, televisions, cameras, display devices, digital media players, video game consoles, on-board computers, or the like.

The destination device 20 may receive the encoded video data from the source device 10 via the link 30. The link 30 may include one or more media or devices capable of moving the encoded video data from the source device 10 to the destination device 20. In one example, the link 30 may include one or more communication media enabling the source device 10 to directly transmit the encoded video data to the destination device 20 in real time. In this example, the source device 10 may modulate the encoded video data according to a communication standard, such as a wireless communication protocol, and may transmit the modulated video data to the destination device 20. The one or more communication media may include wireless and / or wired communication media, such as a radio frequency (RF) spectrum or one or more physical transmission lines. The one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (eg, the Internet). The one or more communication media may include a router, a switch, a base station, or other devices that facilitate communication from the source device 10 to the destination device 20.

In another example, the encoded data may be output from the output interface 140 to the storage device 40. Similarly, the encoded data can be accessed from the storage device 40 through the input interface 240. The storage device 40 may include any of a variety of distributed or locally-accessed data storage media, such as a hard drive, Blu-ray disc, DVD, CD-ROM, flash memory, volatile or non-volatile memory, Or any other suitable digital storage medium for storing encoded video data.

In another example, the storage device 40 may correspond to a file server or another intermediate storage device that may hold the encoded video produced by the source device 10. The destination device 20 may access the stored video data from the storage device 40 via streaming or download. The file server may be any type of server capable of storing encoded video data and transmitting the encoded video data to the destination device 20. Example file servers include a web server (eg, for a website), an FTP server, a network attached storage (NAS) device, or a local disk drive. The destination device 20 can access the encoded video data through any standard data connection, including an Internet connection. This may include a wireless channel (e.g., Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both suitable for accessing encoded video data stored on a file server. The transmission of the encoded video data from the storage device 40 may be a streaming transmission, a download transmission, or a combination of the two.

The motion vector prediction technology of the present application can be applied to video codecs to support a variety of multimedia applications, such as over-the-air television broadcasting, cable television transmission, satellite television transmission, streaming video transmission (e.g., via the Internet), for storage in data storage Encoding of video data on media, decoding of video data stored on data storage media, or other applications. In some examples, the video coding system 1 may be used to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and / or video telephony.

The video decoding system 1 illustrated in FIG. 1 is merely an example, and the techniques of the present application can be applied to a video decoding setting (for example, video encoding or video decoding) that does not necessarily include any data communication between the encoding device and the decoding device. . In other examples, data is retrieved from local storage, streamed over a network, and so on. The video encoding device may encode the data and store the data to a memory, and / or the video decoding device may retrieve the data from the memory and decode the data. In many instances, encoding and decoding are performed by devices that do not communicate with each other, but only encode data to and / or retrieve data from memory and decode data.

In the example of FIG. 1, the source device 10 includes a video source 120, a video encoder 100, and an output interface 140. In some examples, the output interface 140 may include a regulator / demodulator (modem) and / or a transmitter. Video source 120 may include a video capture device (e.g., a video camera), a video archive containing previously captured video data, a video feed interface to receive video data from a video content provider, and / or a computer for generating video data Graphics systems, or a combination of these sources of video data.

The video encoder 100 may encode video data from the video source 120. In some examples, the source device 10 transmits the encoded video data directly to the destination device 20 via the output interface 140. In other examples, the encoded video data may also be stored on the storage device 40 for later access by the destination device 20 for decoding and / or playback.

In the example of FIG. 1, the destination device 20 includes an input interface 240, a video decoder 200, and a display device 220. In some examples, the input interface 240 includes a receiver and / or a modem. The input interface 240 may receive the encoded video data via the link 30 and / or from the storage device 40. The display device 220 may be integrated with the destination device 20 or may be external to the destination device 20. Generally, the display device 220 displays decoded video data. The display device 220 may include various display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or other types of display devices.

Although not illustrated in FIG. 1, in some aspects, video encoder 100 and video decoder 200 may each be integrated with an audio encoder and decoder, and may include an appropriate multiplexer-demultiplexer unit Or other hardware and software to handle encoding of both audio and video in a common or separate data stream. In some examples, the MUX-DEMUX unit may conform to the ITU H.223 multiplexer protocol, or other protocols such as the User Datagram Protocol (UDP), if applicable.

Video encoder 100 and video decoder 200 may each be implemented as any of a variety of circuits such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), Field Programmable Gate Array (FPGA), discrete logic, hardware, or any combination thereof. If the present application is implemented partially in software, the device may store instructions for the software in a suitable non-volatile computer-readable storage medium and may use one or more processors to execute the instructions in hardware Thus implementing the technology of the present application. Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) may be considered as one or more processors. Each of video encoder 100 and video decoder 200 may be included in one or more encoders or decoders, any of which may be integrated as a combined encoder in a corresponding device / Decoder (codec).

This application may generally refer to video encoder 100 as "signaling" or "transmitting" certain information to another device, such as video decoder 200. The terms "signaling" or "transmitting" may generally refer to the transmission of syntax elements and / or other data to decode the compressed video data. This transfer can occur in real time or almost real time. Alternatively, this communication may occur over a period of time, such as when a syntax element is stored in a coded stream to a computer-readable storage medium at the time of encoding, and the decoding device may then store the syntax element after the syntax element is stored on this medium. Retrieve the syntax element at any time.

JCT-VC has developed the H.265 (HEVC) standard. The HEVC standardization is based on an evolution model of a video decoding device called a HEVC test model (HM). The latest standard document of H.265 can be obtained from http://www.itu.int/rec/T-REC-H.265. The latest version of the standard document is H.265 (12/16). The standard document is in full text. The citation is incorporated herein. HM assumes that video decoding devices have several additional capabilities over existing algorithms of ITU-TH.264 / AVC. For example, H.264 provides 9 intra-prediction encoding modes, while HM provides up to 35 intra-prediction encoding modes.

JVET is committed to developing the H.266 standard. The process of H.266 standardization is based on the evolution model of the video decoding device called the H.266 test model. The algorithm description of H.266 can be obtained from http://phenix.int-evry.fr/jvet. The latest algorithm description is included in JVET-F1001-v2. The algorithm description document is incorporated herein by reference in its entirety. . At the same time, reference software for the JEM test model can be obtained from https://jvet.hhi.fraunhofer.de/svn/svn_HMJEMSoftware/, which is also incorporated herein by reference in its entirety.

Generally speaking, the working model description of HM can divide a video frame or image into a sequence of tree blocks or maximum coding units (LCUs) containing both luminance and chrominance samples. LCUs are also known as CTUs. The tree block has a similar purpose as the macro block of the H.264 standard. A slice contains several consecutive tree blocks in decoding order. A video frame or image can be split into one or more slices. Each tree block can be split into coding units according to a quadtree. For example, a tree block that is a root node of a quad tree may be split into four child nodes, and each child node may be a parent node and split into another four child nodes. The final indivisible child nodes that are leaf nodes of the quadtree include decoding nodes, such as decoded video blocks. The syntax data associated with the decoded codestream can define the maximum number of times a tree block can be split, and can also define the minimum size of a decoding node.

The coding unit includes a decoding node, a prediction unit (PU), and a transformation unit (TU) associated with the decoding node. The size of the CU corresponds to the size of the decoding node and the shape must be square. The size of the CU can range from 8 × 8 pixels to a maximum 64 × 64 pixels or larger tree block size. Each CU may contain one or more PUs and one or more TUs. For example, the syntax data associated with a CU may describe a case where a CU is partitioned into one or more PUs. The partitioning mode may be different between cases where the CU is skipped or is encoded in direct mode, intra prediction mode, or inter prediction mode. The PU can be divided into non-square shapes. For example, the syntax data associated with a CU may also describe a case where a CU is partitioned into one or more TUs according to a quadtree. The shape of the TU can be square or non-square.

The HEVC standard allows transformation based on the TU, which can be different for different CUs. The TU is usually sized based on the size of the PUs within a given CU defined for the partitioned LCU, but this may not always be the case. The size of the TU is usually the same as or smaller than the PU. In some feasible implementations, a quad-tree structure called "residual quad-tree" (RQT) can be used to subdivide the residual samples corresponding to the CU into smaller units. The leaf node of RQT may be called TU. The pixel difference values associated with the TU may be transformed to produce a transformation coefficient, which may be quantized.

Generally speaking, the PU contains data related to the prediction process. For example, when a PU is intra-mode encoded, the PU may include data describing the intra-prediction mode of the PU. As another feasible implementation manner, when the PU is inter-mode encoded, the PU may include data defining a motion vector of the PU. For example, the data defining the motion vector of the PU may describe the horizontal component of the motion vector, the vertical component of the motion vector, the resolution of the motion vector (e.g., quarter-pixel accuracy or eighth-pixel accuracy), motion vector The reference image pointed to, and / or the reference image list of the motion vector (eg, list 0, list 1 or list C).

Generally, TU uses transform and quantization processes. A given CU with one or more PUs may also contain one or more TUs. After prediction, video encoder 100 may calculate a residual value corresponding to the PU. The residual values include pixel differences that can be transformed into transform coefficients, quantized, and scanned using TU to generate serialized transform coefficients for entropy decoding. This application generally uses the term "video block" to refer to the decoding node of a CU. In some specific applications, the term “video block” may also be used in this application to refer to a tree block including a decoding node and a PU and a TU, such as an LCU or a CU.

A video sequence usually contains a series of video frames or images. A group of pictures (GOP) exemplarily includes a series, one or more video pictures. The GOP may include syntax data in the header information of the GOP, the header information of one or more of the pictures, or elsewhere, and the syntax data describes the number of pictures included in the GOP. Each slice of the image may contain slice syntax data describing the coding mode of the corresponding image. Video encoder 100 typically operates on video blocks within individual video slices to encode video data. A video block may correspond to a decoding node within a CU. Video blocks may have fixed or varying sizes, and may differ in size according to a specified decoding standard.

As a feasible implementation, HM supports prediction of various PU sizes. Assuming the size of a specific CU is 2N × 2N, HM supports intra prediction of PU sizes of 2N × 2N or N × N, and symmetric PU sizes of 2N × 2N, 2N × N, N × 2N or N × N prediction. HM also supports asymmetric partitioning of PU-sized inter predictions of 2N × nU, 2N × nD, nL × 2N, and nR × 2N. In asymmetric partitioning, one direction of the CU is not partitioned, and the other direction is partitioned into 25% and 75%. The portion of the CU corresponding to the 25% section is indicated by an indication of "n" followed by "Up", "Down", "Left", or "Right". Therefore, for example, “2N × nU” refers to a horizontally-divided 2N × 2NCU, where 2N × 0.5NPU is at the top and 2N × 1.5NPU is at the bottom.

In this application, “N × N” and “N times N” are used interchangeably to refer to the pixel size of a video block according to vertical and horizontal dimensions, for example, 16 × 16 pixels or 16 × 16 pixels. In general, a 16 × 16 block will have 16 pixels (y = 16) in the vertical direction and 16 pixels (x = 16) in the horizontal direction. Similarly, an N × N block has N pixels in the vertical direction and N pixels in the horizontal direction, where N represents a non-negative integer value. Pixels in a block can be arranged in rows and columns. In addition, the block does not necessarily need to have the same number of pixels in the horizontal direction as in the vertical direction. For example, a block may include N × M pixels, where M is not necessarily equal to N.

After the intra-predictive or inter-predictive decoding of the PU using the CU, the video encoder 100 may calculate the residual data of the TU of the CU. A PU may include pixel data in a spatial domain (also referred to as a pixel domain), and a TU may include transforming (e.g., discrete cosine transform (DCT), integer transform, wavelet transform, or conceptually similar transform) Coefficients in the transform domain after being applied to the residual video data. The residual data may correspond to a pixel difference between a pixel of an uncoded image and a prediction value corresponding to a PU. The video encoder 100 may form a TU including residual data of a CU, and then transform the TU to generate a transform coefficient of the CU.

After any transform to generate transform coefficients, video encoder 100 may perform quantization of the transform coefficients. Quantization exemplarily refers to the process of quantizing coefficients to possibly reduce the amount of data used to represent the coefficients to provide further compression. The quantization process may reduce the bit depth associated with some or all of the coefficients. For example, n-bit values may be rounded down to m-bit values during quantization, where n is greater than m.

The JEM model further improves the coding structure of video images. Specifically, a block coding structure called "Quad Tree Combined with Binary Tree" (QTBT) is introduced. The QTBT structure abandons the concepts of CU, PU, and TU in HEVC, and supports more flexible CU division shapes. A CU can be square or rectangular. A CTU first performs a quadtree partition, and the leaf nodes of the quadtree further perform a binary tree partition. At the same time, there are two partitioning modes in binary tree partitioning, symmetrical horizontal partitioning and symmetrical vertical partitioning. The leaf nodes of a binary tree are called CUs. JEM's CUs cannot be further divided during the prediction and transformation process, which means that JEM's CU, PU, and TU have the same block size. In the current JEM, the maximum size of the CTU is 256 × 256 luminance pixels.

In some feasible implementations, the video encoder 100 may utilize a predefined scan order to scan the quantized transform coefficients to generate a serialized vector that can be entropy encoded. In other possible implementations, the video encoder 100 may perform adaptive scanning. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 100 may perform context-adaptive variable length decoding (CAVLC), context-adaptive binary arithmetic decoding (CABAC), syntax-based context-adaptive binary Arithmetic decoding (SBAC), probability interval partition entropy (PIPE) decoding, or other entropy decoding methods to entropy decode a one-dimensional vector. Video encoder 100 may also entropy encode syntax elements associated with the encoded video data for use by video decoder 200 to decode the video data.

To perform CABAC, video encoder 100 may assign a context within a context model to a symbol to be transmitted. Context can be related to whether adjacent values of a symbol are non-zero. To perform CAVLC, the video encoder 100 may select a variable length code of a symbol to be transmitted. Codewords in Variable Length Decoding (VLC) may be constructed such that relatively short codes correspond to more likely symbols and longer codes correspond to less likely symbols. In this way, the use of VLC can achieve the goal of saving code rates relative to using equal length codewords for each symbol to be transmitted. The probability in CABAC can be determined based on the context assigned to the symbol.

In the embodiment of the present application, the video encoder may perform inter prediction to reduce temporal redundancy between images. As described above, a CU may have one or more prediction units PU according to the provisions of different video compression codec standards. In other words, multiple PUs may belong to a CU, or PUs and CUs are the same size. In this article, when the size of the CU and the PU are the same, the CU's partitioning mode is not divided, or it is divided into one PU, and the PU is uniformly used for expression. When the video encoder performs inter prediction, the video encoder may signal the video decoder motion information for the PU. Exemplarily, the motion information of the PU may include: a reference image index, a motion vector, and a prediction direction identifier. A motion vector may indicate a displacement between an image block (also called a video block, a pixel block, a pixel set, etc.) of a PU and a reference block of the PU. The reference block of the PU may be a part of the reference picture similar to the image block of the PU. The reference block may be located in a reference image indicated by a reference image index and a prediction direction identifier.

To reduce the number of coding bits required to represent the motion information of the PU, the video encoder may generate candidate prediction motion vectors (Motion Vector, MV) for each of the PUs according to the merge prediction mode or advanced motion vector prediction mode process. List. Each candidate prediction motion vector in the candidate prediction motion vector list for the PU may indicate motion information. The motion information indicated by some candidate prediction motion vectors in the candidate prediction motion vector list may be based on the motion information of other PUs. If the candidate prediction motion vector indicates motion information specifying one of a spatial candidate prediction motion vector position or a temporal candidate prediction motion vector position, the present application may refer to the candidate prediction motion vector as an "original" candidate prediction motion vector. For example, for a merge mode, also referred to herein as a merge prediction mode, there may be five original spatial candidate prediction motion vector positions and one original temporal candidate prediction motion vector position. In some examples, the video encoder may generate additional candidate prediction motion vectors by combining partial motion vectors from different original candidate prediction motion vectors, modifying the original candidate prediction motion vectors, or inserting only zero motion vectors as candidate prediction motion vectors. These additional candidate prediction motion vectors are not considered as original candidate prediction motion vectors and may be referred to as artificially generated candidate prediction motion vectors in this application.

The techniques of this application generally relate to a technique for generating a list of candidate prediction motion vectors at a video encoder and a technique for generating the same list of candidate prediction motion vectors at a video decoder. The video encoder and video decoder may generate the same candidate prediction motion vector list by implementing the same techniques used to construct the candidate prediction motion vector list. For example, both a video encoder and a video decoder may build a list with the same number of candidate prediction motion vectors (eg, five candidate prediction motion vectors). Video encoders and decoders may first consider spatial candidate prediction motion vectors (e.g., neighboring blocks in the same image), then consider temporal candidate prediction motion vectors (e.g., candidate prediction motion vectors in different images), and finally consider The artificially generated candidate prediction motion vectors are added until a desired number of candidate prediction motion vectors are added to the list. According to the technology of the present application, a pruning operation may be used for certain types of candidate prediction motion vectors during the construction of the candidate prediction motion vector list to remove duplicates from the candidate prediction motion vector list, while for other types of candidate prediction motion vectors, it may Use pruning to reduce decoder complexity. For example, for a set of spatial candidate prediction motion vectors and for a time candidate prediction motion vector, a pruning operation may be performed to exclude candidate prediction motion vectors with duplicate motion information from the list of candidate prediction motion vectors. However, when artificially generated candidate predicted motion vectors are added to the list of candidate predicted motion vectors, artificially generated candidate predicted motion vectors may be added without performing a trimming operation on the artificially generated candidate predicted motion vectors.

After generating the candidate prediction motion vector list for the PU of the CU, the video encoder may select the candidate prediction motion vector from the candidate prediction motion vector list and output the candidate prediction motion vector index in the code stream. The selected candidate prediction motion vector may be a candidate prediction motion vector having a motion vector that most closely matches the predictor of the target PU being decoded. The candidate prediction motion vector index may indicate a position where a candidate prediction motion vector is selected in the candidate prediction motion vector list. The video encoder may also generate a predictive image block for the PU based on a reference block indicated by the motion information of the PU. The motion information of the PU may be determined based on the motion information indicated by the selected candidate prediction motion vector. For example, in the merge mode, the motion information of the PU may be the same as the motion information indicated by the selected candidate prediction motion vector. In the AMVP mode, the motion information of the PU may be determined based on the motion vector difference of the PU and the motion information indicated by the selected candidate prediction motion vector. The video encoder may generate one or more residual image blocks for the CU based on the predictive image blocks of the PU of the CU and the original image blocks for the CU. The video encoder may then encode one or more residual image blocks and output one or more residual image blocks in a code stream.

The codestream may include data identifying a selected candidate prediction motion vector in the candidate prediction motion vector list of the PU. The video decoder may determine the motion information of the PU based on the motion information indicated by the selected candidate prediction motion vector in the candidate prediction motion vector list of the PU. The video decoder may identify one or more reference blocks for the PU based on the motion information of the PU. After identifying one or more reference blocks of the PU, the video decoder may generate predictive image blocks for the PU based on the one or more reference blocks of the PU. The video decoder may reconstruct an image block for a CU based on a predictive image block for a PU of the CU and one or more residual image blocks for the CU.

For ease of explanation, the present application may describe a position or an image block as having various spatial relationships with a CU or a PU. This description can be interpreted to mean that the position or image block and the image block associated with the CU or PU have various spatial relationships. In addition, in this application, a PU currently being decoded by a video decoder may be referred to as a current PU, and may also be referred to as a current image block to be processed. This application may refer to the CU that the video decoder is currently decoding as the current CU. This application may refer to the image currently being decoded by the video decoder as the current image. It should be understood that this application is applicable to a case where the PU and the CU have the same size, or the PU is the CU, and the PU is used to represent the same.

As briefly described previously, video encoder 100 may use inter prediction to generate predictive image blocks and motion information for a PU of a CU. In many examples, the motion information of a given PU may be the same or similar to the motion information of one or more nearby PUs (ie, PUs whose image blocks are spatially or temporally near the image blocks of the given PU). Because nearby PUs often have similar motion information, video encoder 100 may refer to the motion information of nearby PUs to encode motion information for a given PU. Encoding the motion information of a given PU with reference to the motion information of nearby PUs can reduce the number of encoding bits required to indicate the motion information of a given PU in the code stream.

Video encoder 100 may refer to motion information of nearby PUs in various ways to encode motion information for a given PU. For example, video encoder 100 may indicate that the motion information of a given PU is the same as the motion information of nearby PUs. This application may use a merge mode to refer to indicating that the motion information of a given PU is the same as that of nearby PUs or may be derived from the motion information of nearby PUs. In another feasible implementation, the video encoder 100 may calculate a Motion Vector Difference (MVD) for a given PU. MVD indicates the difference between the motion vector of a given PU and the motion vector of a nearby PU. Video encoder 100 may include MVD instead of a motion vector of a given PU in the motion information of a given PU. Representing MVD in the codestream requires fewer coding bits than representing the motion vector of a given PU. This application may use advanced motion vector prediction mode to refer to the motion information of a given PU by using the MVD and an index value identifying a candidate motion vector.

In order to use the merge mode or the AMVP mode to signal the motion information of a given PU on the decoding side, the video encoder 100 may generate a list of candidate predicted motion vectors for a given PU. The candidate prediction motion vector list may include one or more candidate prediction motion vectors. Each of the candidate prediction motion vectors in the candidate prediction motion vector list for a given PU may specify motion information. The motion information indicated by each candidate prediction motion vector may include a motion vector, a reference image index, and a prediction direction identifier. The candidate prediction motion vectors in the candidate prediction motion vector list may include "raw" candidate prediction motion vectors, each of which indicates motion information that is different from one of the specified candidate prediction motion vector positions within a PU of a given PU.

After generating the candidate prediction motion vector list for the PU, the video encoder 100 may select one of the candidate prediction motion vectors from the candidate prediction motion vector list for the PU. For example, a video encoder may compare each candidate prediction motion vector with the PU being decoded and may select a candidate prediction motion vector with a desired code rate-distortion cost. Video encoder 100 may output a candidate prediction motion vector index for a PU. The candidate prediction motion vector index may identify the position of the selected candidate prediction motion vector in the candidate prediction motion vector list.

In addition, the video encoder 100 may generate a predictive image block for a PU based on a reference block indicated by motion information of the PU. The motion information of the PU may be determined based on the motion information indicated by the selected candidate prediction motion vector in the candidate prediction motion vector list for the PU. For example, in the merge mode, the motion information of the PU may be the same as the motion information indicated by the selected candidate prediction motion vector. In the AMVP mode, motion information of a PU may be determined based on a motion vector difference for the PU and motion information indicated by a selected candidate prediction motion vector. Video encoder 100 may process predictive image blocks for a PU as described previously.

When video decoder 200 receives a code stream, video decoder 200 may generate a list of candidate predicted motion vectors for each of the PUs of the CU. The candidate prediction motion vector list generated by the video decoder 200 for the PU may be the same as the candidate prediction motion vector list generated by the video encoder 100 for the PU. The syntax element parsed from the bitstream may indicate the position of the candidate prediction motion vector selected in the candidate prediction motion vector list of the PU. After generating a list of candidate prediction motion vectors for the PU, the video decoder 200 may generate predictive image blocks for the PU based on one or more reference blocks indicated by the motion information of the PU. Video decoder 200 may determine motion information of the PU based on the motion information indicated by the selected candidate prediction motion vector in the candidate prediction motion vector list for the PU. Video decoder 200 may reconstruct an image block for a CU based on a predictive image block for a PU and a residual image block for a CU.

It should be understood that, in a feasible implementation manner, at the decoding end, the construction of the candidate prediction motion vector list and the parsing of the selected candidate prediction motion vector from the code stream in the candidate prediction motion vector list are independent of each other, and can be arbitrarily Sequentially or in parallel.

In another feasible implementation manner, at the decoding end, the position of the selected candidate prediction motion vector in the candidate prediction motion vector list is first parsed from the code stream, and a candidate prediction motion vector list is constructed based on the parsed position. In the embodiment, it is not necessary to construct all the candidate prediction motion vector lists, but only the candidate prediction motion vector list at the parsed position, that is, the candidate prediction motion vectors at the position can be determined. For example, when the selected candidate predictive motion vector is obtained by parsing the bitstream and is a candidate predictive motion vector with an index of 3 in the candidate predictive motion vector list, only the candidate predictive motion vector from index 0 to index 3 needs to be constructed The list can determine the candidate predicted motion vector with the index of 3, which can achieve the technical effect of reducing complexity and improving decoding efficiency.

FIG. 2 is a block diagram of a video encoder 100 according to an example described in the embodiment of the present application. The video encoder 100 is configured to output a video to the post-processing entity 41. The post-processing entity 41 represents an example of a video entity that can process the encoded video data from the video encoder 100, such as a media-aware network element (MANE) or a stitching / editing device. In some cases, the post-processing entity 41 may be an instance of a network entity. In some video encoding systems, the post-processing entity 41 and the video encoder 100 may be parts of separate devices, while in other cases, the functionality described with respect to the post-processing entity 41 may be performed by the same device including the video encoder 100 carried out. In a certain example, the post-processing entity 41 is an example of the storage device 40 of FIG. 1.

In the example of FIG. 2, the video encoder 100 includes a prediction processing unit 108, a filter unit 106, a decoded image buffer (DPB) 107, a summer 112, a transformer 101, a quantizer 102, and an entropy encoder 103. The prediction processing unit 108 includes an inter predictor 110 and an intra predictor 109. For image block reconstruction, the video encoder 100 further includes an inverse quantizer 104, an inverse transformer 105, and a summer 111. The filter unit 106 is intended to represent one or more loop filters, such as a deblocking filter, an adaptive loop filter (ALF), and a sample adaptive offset (SAO) filter. Although the filter unit 106 is shown as an in-loop filter in FIG. 2A, in other implementations, the filter unit 106 may be implemented as a post-loop filter. In one example, the video encoder 100 may further include a video data memory and a segmentation unit (not shown in the figure).

The video data memory may store video data to be encoded by the components of the video encoder 100. The video data stored in the video data storage may be obtained from the video source 120. The DPB 107 may be a reference image memory that stores reference video data used by the video encoder 100 to encode video data in an intra-frame or inter-frame decoding mode. Video data memory and DPB 107 can be formed by any of a variety of memory devices, such as dynamic random access memory (DRAM), synchronous resistive RAM (MRAM), resistive RAM (RRAM) including synchronous DRAM (SDRAM), Or other types of memory devices. Video data storage and DPB 107 can be provided by the same storage device or separate storage devices. In various examples, the video data memory may be on-chip with other components of video encoder 100 or off-chip relative to those components.

As shown in FIG. 2, the video encoder 100 receives video data and stores the video data in a video data memory. The segmentation unit divides the video data into several image blocks, and these image blocks can be further divided into smaller blocks, such as image block segmentation based on a quad tree structure or a binary tree structure. This segmentation may also include segmentation into slices, tiles, or other larger units. Video encoder 100 typically illustrates components that encode image blocks within a video slice to be encoded. The slice can be divided into multiple image patches (and possibly into a collection of image patches called slices). The prediction processing unit 108 may select one of a plurality of possible coding modes for the current image block, such as one of a plurality of intra coding modes or one of a plurality of inter coding modes. The prediction processing unit 108 may provide the obtained intra, inter-coded block to the summer 112 to generate a residual block, and to the summer 111 to reconstruct an encoded block used as a reference image.

The intra predictor 109 within the prediction processing unit 108 may perform intra predictive encoding of the current image block with respect to one or more neighboring blocks in the same frame or slice as the current block to be encoded to remove spatial redundancy. . The inter predictor 110 within the prediction processing unit 108 may perform inter predictive coding of the current image block with respect to one or more prediction blocks in the one or more reference images to remove temporal redundancy.

Specifically, the inter predictor 110 may be configured to determine an inter prediction mode for encoding a current image block. For example, the inter predictor 110 may use a rate-distortion analysis to calculate the rate-distortion values of various inter-prediction modes in the set of candidate inter-prediction modes, and select from them the best rate-distortion characteristics. Inter prediction mode. Code rate distortion analysis generally determines the amount of distortion (or error) between the coded block and the original uncoded block that was coded to produce the coded block, and the bit rate (also That is, the number of bits). For example, the inter predictor 110 may determine that the inter prediction mode with the lowest code rate distortion cost of encoding the current image block in the candidate inter prediction mode set is the inter prediction mode used for inter prediction of the current image block.

The inter predictor 110 is configured to predict motion information (for example, a motion vector) of one or more subblocks in the current image block based on the determined inter prediction mode, and use the motion information (for example, Motion vector) to obtain or generate a prediction block of the current image block. The inter predictor 110 may locate a prediction block pointed to by the motion vector in one of the reference image lists. The inter predictor 110 may also generate syntax elements associated with image blocks and video slices for use by the video decoder 200 when decoding image blocks of the video slice. In another example, the inter predictor 110 uses the motion information of each sub-block to perform a motion compensation process to generate a prediction block of each sub-block, thereby obtaining a prediction block of the current image block; The inter predictor 110 performs motion estimation and motion compensation processes.

Specifically, after the inter prediction mode is selected for the current image block, the inter predictor 110 may provide information indicating the selected inter prediction mode of the current image block to the entropy encoder 103 so that the entropy encoder 103 encodes the instruction. Information on the selected inter prediction mode.

The intra predictor 109 may perform intra prediction on the current image block. In particular, the intra predictor 109 may determine an intra prediction mode used to encode the current block. For example, the intra predictor 109 may use a rate-distortion analysis to calculate the rate-distortion values of various intra-prediction modes to be tested, and select the one with the best rate-distortion characteristics from the modes to be tested. Intra prediction mode. In any case, after the intra prediction mode is selected for the image block, the intra predictor 109 may provide information indicating the selected intra prediction mode of the current image block to the entropy encoder 103 so that the entropy encoder 103 encodes the indication Information on the selected intra prediction mode.

After the prediction processing unit 108 generates a prediction block of the current image block via inter prediction and intra prediction, the video encoder 100 forms a residual image block by subtracting the prediction block from the current image block to be encoded. The summer 112 represents one or more components that perform this subtraction operation. The residual video data in the residual block may be included in one or more TUs and applied to the transformer 101. The transformer 101 transforms the residual video data into residual transform coefficients using a transform such as a discrete cosine transform (DCT) or a conceptually similar transform. The transformer 101 may transform the residual video data from a pixel value domain to a transform domain, such as a frequency domain.

The transformer 101 may send the obtained transform coefficients to a quantizer 102. A quantizer 102 quantizes the transform coefficients to further reduce the bit code rate. In some examples, the quantizer 102 may then perform a scan of a matrix containing the quantized transform coefficients. Alternatively, the entropy encoder 103 may perform scanning.

After quantization, the entropy encoder 103 entropy encodes the quantized transform coefficients. For example, the entropy encoder 103 can perform context-adaptive variable-length coding (CAVLC), context-adaptive binary arithmetic coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), and probability interval segmentation entropy (PIPE ) Coding or another entropy coding method or technique. After entropy encoding by the entropy encoder 103, the encoded code stream may be transmitted to the video decoder 200, or archived for later transmission or retrieved by the video decoder 200. The entropy encoder 103 may also perform entropy coding on the syntax elements of the current image block to be coded.

The inverse quantizer 104 and the inverse changer 105 respectively apply inverse quantization and inverse transform to reconstruct the residual block in the pixel domain, for example, for later use as a reference block of a reference image. The summer 111 adds the reconstructed residual block to a prediction block generated by the inter predictor 110 or the intra predictor 109 to generate a reconstructed image block. The filter unit 106 may be adapted to reconstruct image blocks to reduce distortion, such as block artifacts. This reconstructed image block is then stored as a reference block in the decoded image buffer 107 and can be used by the inter predictor 110 as a reference block to perform inter prediction on subsequent video frames or blocks in the image.

It should be understood that other structural changes of the video encoder 100 may be used to encode a video stream. For example, for certain image blocks or image frames, the video encoder 100 may directly quantize the residual signal without processing by the transformer 101 and correspondingly does not need to be processed by the inverse transformer 105; or, for some image blocks Or image frames, the video encoder 100 does not generate residual data, and accordingly does not need to be processed by the transformer 101, quantizer 102, inverse quantizer 104, and inverse transformer 105; or, the video encoder 100 may convert the reconstructed image The blocks are stored directly as reference blocks without being processed by the filter unit 106; alternatively, the quantizer 102 and the inverse quantizer 104 in the video encoder 100 may be merged together.

FIG. 3 is a block diagram of an example video decoder 200 described in the embodiment of the present application. In the example of FIG. 3, the video decoder 200 includes an entropy decoder 203, a prediction processing unit 208, an inverse quantizer 204, an inverse transformer 205, a summer 211, a filter unit 206, and a decoded image buffer 207. The prediction processing unit 208 may include an inter predictor 210 and an intra predictor 209. In some examples, video decoder 200 may perform a decoding process that is substantially inverse to the encoding process described with respect to video encoder 100 from FIG. 2.

During the decoding process, the video decoder 200 receives from the video encoder 100 an encoded video codestream representing image blocks of the encoded video slice and associated syntax elements. The video decoder 200 may receive video data from the network entity 42, optionally, the video data may also be stored in a video data storage (not shown in the figure). The video data memory may store video data, such as an encoded video code stream, to be decoded by components of the video decoder 200. The video data stored in the video data storage can be obtained, for example, from the storage device 40, from a local video source such as a camera, via a wired or wireless network of video data, or by accessing a physical data storage medium. The video data memory can be used as a decoded image buffer (CPB) for storing encoded video data from the encoded video bitstream. Therefore, although the video data storage is not shown in FIG. 3, the video data storage and the DPB 207 may be the same storage, or may be separately provided storages. Video data memory and DPB 207 can be formed by any of a variety of memory devices, such as: dynamic random access memory (DRAM) including synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), and resistive RAM (RRAM) , Or other types of memory devices. In various examples, the video data memory may be integrated on a chip with other components of the video decoder 200 or provided off-chip relative to those components.

The network entity 42 may be, for example, a server, a MANE, a video editor / splicer, or other such device for implementing one or more of the techniques described above. The network entity 42 may or may not include a video encoder, such as video encoder 100. Before the network entity 42 sends the encoded video code stream to the video decoder 200, the network entity 42 may implement some of the techniques described in this application. In some video decoding systems, the network entity 42 and the video decoder 200 may be part of separate devices, while in other cases, the functionality described with respect to the network entity 42 may be performed by the same device including the video decoder 200. In some cases, the network entity 42 may be an example of the storage device 40 of FIG. 1.

The entropy decoder 203 of the video decoder 200 entropy decodes the code stream to generate quantized coefficients and some syntax elements. The entropy decoder 203 forwards the syntax elements to the prediction processing unit 208. Video decoder 200 may receive syntax elements at a video slice level and / or an image block level.

When a video slice is decoded into an intra decoded (I) slice, the intra predictor 209 of the prediction processing unit 208 may be based on the signaled intra prediction mode and the previously decoded block from the current frame or image. Data to generate prediction blocks for image blocks of the current video slice. When a video slice is decoded into an inter-decoded (ie, B or P) slice, the inter predictor 210 of the prediction processing unit 208 may determine, based on the syntax elements received from the entropy decoder 203, the An inter prediction mode in which a current image block of a video slice is decoded, and based on the determined inter prediction mode, the current image block is decoded (for example, inter prediction is performed). Specifically, the inter predictor 210 may determine whether to use the new inter prediction mode to predict the current image block of the current video slice. If the syntax element indicates that the new inter prediction mode is used to predict the current image block, based on A new inter prediction mode (for example, a new inter prediction mode specified by a syntax element or a default new inter prediction mode) predicts the current image block of the current video slice or a sub-block of the current image block. Motion information, so that the motion information of the current image block or a sub-block of the current image block is used to obtain or generate a prediction block of the current image block or a sub-block of the current image block through a motion compensation process. The motion information here may include reference image information and motion vectors, where the reference image information may include but is not limited to unidirectional / bidirectional prediction information, a reference image list number, and a reference image index corresponding to the reference image list. For inter prediction, a prediction block may be generated from one of reference pictures within one of the reference picture lists. The video decoder 200 may construct a reference image list, that is, a list 0 and a list 1, based on the reference images stored in the DPB 207. The reference frame index of the current image may be included in one or more of the reference frame list 0 and list 1. In some examples, the video encoder 100 may signal whether to use a new inter prediction mode to decode a specific syntax element of a specific block, or may be a signal to indicate whether to use a new inter prediction mode. And indicating which new inter prediction mode is used to decode a specific syntax element of a specific block. It should be understood that the inter predictor 210 here performs a motion compensation process.

The inverse quantizer 204 inverse quantizes, that is, dequantizes, the quantized transform coefficients provided in the code stream and decoded by the entropy decoder 203. The inverse quantization process may include using a quantization parameter calculated by the video encoder 100 for each image block in the video slice to determine the degree of quantization that should be applied and similarly to determine the degree of inverse quantization that should be applied. The inverse transformer 205 applies an inverse transform to transform coefficients, such as an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process to generate a residual block in the pixel domain.

After the inter predictor 210 generates a prediction block for the current image block or a subblock of the current image block, the video decoder 200 works by comparing the residual block from the inverse transformer 205 with the corresponding prediction generated by the inter predictor 210 The blocks are summed to get the reconstructed block, that is, the decoded image block. The summer 211 represents a component that performs this summing operation. When needed, a loop filter (in or after the decoding loop) can also be used to smooth pixel transitions or otherwise improve video quality. The filter unit 206 may represent one or more loop filters, such as a deblocking filter, an adaptive loop filter (ALF), and a sample adaptive offset (SAO) filter. Although the filter unit 206 is shown as an in-loop filter in FIG. 2B, in other implementations, the filter unit 206 may be implemented as a post-loop filter. In one example, the filter unit 206 is adapted to reconstruct a block to reduce block distortion, and the result is output as a decoded video stream. And, a decoded image block in a given frame or image may also be stored in a decoded image buffer 207, and the decoded image buffer 207 stores a reference image for subsequent motion compensation. The decoded image buffer 207 may be part of a memory, which may also store the decoded video for later presentation on a display device, such as the display device 220 of FIG. 1, or may be separate from such memory.

It should be understood that other structural changes of the video decoder 200 may be used to decode the encoded video code stream. For example, the video decoder 200 may generate an output video stream without being processed by the filter unit 206; or, for certain image blocks or image frames, the entropy decoder 203 of the video decoder 200 does not decode the quantized coefficients, and accordingly, It does not need to be processed by the inverse quantizer 204 and the inverse transformer 205.

As noted previously, the techniques of this application exemplarily involve inter-frame decoding. It should be understood that the techniques of this application may be performed by any of the video decoders described in this application. The video decoder includes, for example, video encoder 100 and video decoding as shown and described with respect to FIGS.器 200。 200. That is, in one feasible implementation, the inter predictor 110 described with respect to FIG. 2 may perform specific techniques described below when performing inter prediction during encoding of a block of video data. In another possible implementation, the inter predictor 210 described with respect to FIG. 3 may perform specific techniques described below when performing inter prediction during decoding of a block of video data. Thus, a reference to a generic "video encoder" or "video decoder" may include video encoder 100, video decoder 200, or another video encoding or coding unit.

FIG. 4 is a schematic block diagram of an inter prediction module according to an embodiment of the present application. The inter prediction module 121, for example, may include a motion estimation unit 42 and a motion compensation unit 44. The relationship between PU and CU is different in different video compression codecs. The inter prediction module 121 may partition a current CU into a PU according to a plurality of partitioning modes. For example, the inter prediction module 121 may partition a current CU into a PU according to 2N × 2N, 2N × N, N × 2N, and N × N partition modes. In other embodiments, the current CU is the current PU, which is not limited.

The inter prediction module 121 may perform integer motion estimation (IME) and then perform fractional motion estimation (FME) on each of the PUs. When the inter prediction module 121 performs IME on a PU, the inter prediction module 121 may search a reference block for a PU in one or more reference images. After the reference block for the PU is found, the inter prediction module 121 may generate a motion vector indicating the spatial displacement between the PU and the reference block for the PU with integer precision. When the inter prediction module 121 performs FME on the PU, the inter prediction module 121 may improve a motion vector generated by performing IME on the PU. A motion vector generated by performing FME on a PU may have sub-integer precision (eg, 1/2 pixel precision, 1/4 pixel precision, etc.). After generating a motion vector for the PU, the inter prediction module 121 may use the motion vector for the PU to generate a predictive image block for the PU.

In some feasible implementations of the inter prediction module 121 using the AMVP mode to signal the motion information of the decoding end PU, the inter prediction module 121 may generate a list of candidate prediction motion vectors for the PU. The candidate prediction motion vector list may include one or more original candidate prediction motion vectors and one or more additional candidate prediction motion vectors derived from the original candidate prediction motion vectors. After generating the candidate prediction motion vector list for the PU, the inter prediction module 121 may select the candidate prediction motion vector from the candidate prediction motion vector list and generate a motion vector difference (MVD) for the PU. The MVD for a PU may indicate a difference between a motion vector indicated by a selected candidate prediction motion vector and a motion vector generated for the PU using IME and FME. In these feasible implementations, the inter prediction module 121 may output a candidate prediction motion vector index that identifies the position of the selected candidate prediction motion vector in the candidate prediction motion vector list. The inter prediction module 121 may also output the MVD of the PU. A detailed implementation of the advanced motion vector prediction (AMVP) mode in the embodiment of the present application in FIG. 6 is described in detail below.

In addition to generating motion information for the PU by performing IME and FME on the PU, the inter prediction module 121 may also perform a merge operation on each of the PUs. When the inter prediction module 121 performs a merge operation on the PU, the inter prediction module 121 may generate a list of candidate prediction motion vectors for the PU. The candidate prediction motion vector list for the PU may include one or more original candidate prediction motion vectors and one or more additional candidate prediction motion vectors derived from the original candidate prediction motion vectors. The original candidate prediction motion vector in the candidate prediction motion vector list may include one or more spatial candidate prediction motion vectors and temporal candidate prediction motion vectors. The spatial candidate prediction motion vector may indicate motion information of other PUs in the current image. The temporal candidate prediction motion vector may be based on motion information of a corresponding PU different from the current picture. The temporal candidate prediction motion vector may also be referred to as temporal motion vector prediction (TMVP).

After generating the candidate prediction motion vector list, the inter prediction module 121 may select one of the candidate prediction motion vectors from the candidate prediction motion vector list. The inter prediction module 121 may then generate a predictive image block for the PU based on the reference block indicated by the motion information of the PU. In the merge mode, the motion information of the PU may be the same as the motion information indicated by the selected candidate prediction motion vector. Figure 5 described below illustrates an exemplary flowchart for Merge.

After generating a predictive image block for a PU based on IME and FME and a predictive image block for a PU based on a merge operation, the inter prediction module 121 may select a predictive image block generated through the FME operation or a merge operation. Predictive image blocks. In some feasible implementations, the inter prediction module 121 may select a predictive image for a PU based on a code rate-distortion cost analysis of the predictive image block generated by the FME operation and the predictive image block generated by the merge operation. Piece.

After the inter prediction module 121 has selected a predictive image block of the PU generated by dividing the current CU according to each of the partitioning modes (in some embodiments, the coding tree unit CTU is divided into CUs, no further (It is divided into smaller PUs. At this time, the PU is equivalent to the CU.) The inter prediction module 121 may select a partitioning mode for the current CU. In some embodiments, the inter prediction module 121 may select a rate-distortion cost analysis for a selected predictive image block of the PU generated by segmenting the current CU according to each of the partitioning modes to select the Split mode. The inter prediction module 121 may output a predictive image block associated with a PU belonging to the selected partition mode to the residual generation module 102. The inter prediction module 121 may output a syntax element indicating motion information of a PU belonging to the selected partitioning mode to the entropy encoding module 116.

In the schematic diagram of FIG. 4, the inter prediction module 121 includes IME modules 180A to 180N (collectively referred to as “IME module 180”), FME modules 182A to 182N (collectively referred to as “FME module 182”), and merge modules 184A to 184N (collectively referred to as Are "merging module 184"), PU mode decision modules 186A to 186N (collectively referred to as "PU mode decision module 186") and CU mode decision module 188 (which may also include performing a mode decision process from CTU to CU).

The IME module 180, the FME module 182, and the merge module 184 may perform an IME operation, an FME operation, and a merge operation on a PU of the current CU. The inter prediction module 121 is illustrated in the schematic diagram of FIG. 4 as including a separate IME module 180, an FME module 182, and a merging module 184 for each PU of each partitioning mode of the CU. In other feasible implementations, the inter prediction module 121 does not include a separate IME module 180, an FME module 182, and a merge module 184 for each PU of each partitioning mode of the CU.

As illustrated in the schematic diagram of FIG. 4, the IME module 180A, the FME module 182A, and the merge module 184A may perform IME operations, FME operations, and merge operations on a PU generated by dividing a CU according to a 2N × 2N split mode. The PU mode decision module 186A may select one of the predictive image blocks generated by the IME module 180A, the FME module 182A, and the merge module 184A.

The IME module 180B, the FME module 182B, and the merge module 184B may perform an IME operation, an FME operation, and a merge operation on a left PU generated by dividing a CU according to an N × 2N division mode. The PU mode decision module 186B may select one of the predictive image blocks generated by the IME module 180B, the FME module 182B, and the merge module 184B.

The IME module 180C, the FME module 182C, and the merge module 184C may perform an IME operation, an FME operation, and a merge operation on a right PU generated by dividing a CU according to an N × 2N division mode. The PU mode decision module 186C may select one of the predictive image blocks generated by the IME module 180C, the FME module 182C, and the merge module 184C.

The IME module 180N, the FME module 182N, and the merge module 184 may perform an IME operation, an FME operation, and a merge operation on a lower right PU generated by dividing a CU according to an N × N division mode. The PU mode decision module 186N may select one of the predictive image blocks generated by the IME module 180N, the FME module 182N, and the merge module 184N.

The PU mode decision module 186 may select a predictive image block based on a code rate-distortion cost analysis of a plurality of possible predictive image blocks, and select a predictive image block that provides the best code rate-distortion cost for a given decoding situation. For example, for bandwidth-constrained applications, the PU mode decision module 186 may prefer to select predictive image blocks that increase the compression ratio, while for other applications, the PU mode decision module 186 may prefer to select predictive images that increase the quality of the reconstructed video. Piece. After the PU mode decision module 186 selects a predictive image block for the PU of the current CU, the CU mode decision module 188 selects a partition mode for the current CU and outputs the predictive image block and motion information of the PU belonging to the selected partition mode. .

FIG. 5 is an exemplary flowchart of a merge mode in an embodiment of the present application. A video encoder (eg, video encoder 100) may perform a merge operation 200. In other feasible implementations, the video encoder may perform a merge operation different from the merge operation 200. For example, in other feasible implementations, the video encoder may perform a merge operation, where the video encoder performs more or fewer steps than the merge operation 200 or steps different from the merge operation 200. In other possible implementations, the video encoder may perform the steps of the merge operation 200 in a different order or in parallel. The encoder may also perform a merge operation 200 on a PU encoded in a skip mode.

After the video encoder starts the merge operation 200, the video encoder may generate a list of candidate predicted motion vectors for the current PU (202). The video encoder may generate a list of candidate prediction motion vectors for the current PU in various ways. For example, the video encoder may generate a list of candidate prediction motion vectors for the current PU according to one of the example techniques described below with respect to FIGS. 8-12.

As mentioned before, the candidate prediction motion vector list for the current PU may include a temporal candidate prediction motion vector. The temporal candidate prediction motion vector may indicate motion information of a co-located PU in the time domain. A co-located PU may be spatially in the same position in the image frame as the current PU, but in a reference picture instead of the current picture. In this application, a reference picture including a PU corresponding to the time domain may be referred to as a related reference picture. A reference image index of a related reference image may be referred to as a related reference image index in this application. As described previously, the current image may be associated with one or more reference image lists (eg, list 0, list 1, etc.). The reference image index may indicate a reference image by indicating a position in a reference image list of the reference image. In some feasible implementations, the current image may be associated with a combined reference image list.

In some video encoders, the related reference picture index is the reference picture index of the PU covering the reference index source position associated with the current PU. In these video encoders, the reference index source location associated with the current PU is adjacent to the left of the current PU or above the current PU. In this application, if an image block associated with a PU includes a specific location, the PU may "cover" the specific location. In these video encoders, if a reference index source location is not available, the video encoder can use a zero reference image index.

However, there may be examples where the reference index source location associated with the current PU is within the current CU. In these examples, if the PU is above or to the left of the current CU, the PU covering the reference index source location associated with the current PU may be considered available. However, the video encoder may need to access motion information of another PU of the current CU in order to determine a reference picture containing a co-located PU. Therefore, these video encoders may use motion information (ie, a reference picture index) of a PU belonging to the current CU to generate a temporal candidate prediction motion vector for the current PU. In other words, these video encoders may use temporal information of a PU belonging to the current CU to generate a temporal candidate prediction motion vector. Therefore, the video encoder may not be able to generate a list of candidate prediction motion vectors for the current PU and the PU covering the reference index source position associated with the current PU in parallel.

According to the technology of the present application, the video encoder may explicitly set the relevant reference picture index without referring to the reference picture index of any other PU. This may enable the video encoder to generate candidate prediction motion vector lists for the current PU and other PUs of the current CU in parallel. Because the video encoder explicitly sets the relevant reference picture index, the relevant reference picture index is not based on the motion information of any other PU of the current CU. In some feasible implementations where the video encoder explicitly sets the relevant reference picture index, the video encoder may always set the relevant reference picture index to a fixed, predefined preset reference picture index (eg, 0). In this way, the video encoder may generate a temporal candidate prediction motion vector based on the motion information of the co-located PU in the reference frame indicated by the preset reference picture index, and may include the temporal candidate prediction motion vector in the candidate prediction of the current CU List of motion vectors.

In a feasible implementation where the video encoder explicitly sets the relevant reference picture index, the video encoder may be explicitly used in a syntax structure (e.g., image header, slice header, APS, or another syntax structure) The related reference picture index is signaled. In this feasible implementation manner, the video encoder may signal the decoder to the relevant reference picture index for each LCU (ie, CTU), CU, PU, TU, or other type of sub-block. For example, the video encoder may signal that the relevant reference picture index for each PU of the CU is equal to "1".

In some feasible implementations, the relevant reference image index may be set implicitly rather than explicitly. In these feasible implementations, the video encoder may use the motion information of the PU in the reference image indicated by the reference image index of the PU covering the location outside the current CU to generate a candidate prediction motion vector list for the PU of the current CU. Each time candidate predicts a motion vector, even if these locations are not strictly adjacent to the current PU.

After generating a list of candidate prediction motion vectors for the current PU, the video encoder may generate predictive image blocks associated with the candidate prediction motion vectors in the candidate prediction motion vector list (204). The video encoder may generate the candidate prediction motion vector by determining the motion information of the current PU based on the motion information of the indicated candidate prediction motion vector and then generating a predictive image block based on one or more reference blocks indicated by the motion information of the current PU. Associated predictive image blocks. The video encoder may then select one of the candidate prediction motion vectors from the candidate prediction motion vector list (206). The video encoder can select candidate prediction motion vectors in various ways. For example, a video encoder may select one of the candidate prediction motion vectors based on a code rate-distortion cost analysis of each of the predictive image blocks associated with the candidate prediction motion vector.

After selecting the candidate prediction motion vector, the video encoder may output a candidate prediction motion vector index (208). The candidate prediction motion vector index may indicate a position where a candidate prediction motion vector is selected in the candidate prediction motion vector list. In some feasible implementations, the candidate prediction motion vector index may be represented as "merge_idx".

FIG. 6 is an exemplary flowchart of an advanced motion vector prediction (AMVP) mode in an embodiment of the present application. A video encoder (eg, video encoder 100) may perform AMVP operation 210.

After the video encoder starts AMVP operation 210, the video encoder may generate one or more motion vectors for the current PU (211). The video encoder may perform integer motion estimation and fractional motion estimation to generate motion vectors for the current PU. As described earlier, the current image may be associated with two reference image lists (List 0 and List 1). If the current PU is unidirectionally predicted, the video encoder may generate a list 0 motion vector or a list 1 motion vector for the current PU. The list 0 motion vector may indicate a spatial displacement between an image block of the current PU and a reference block in a reference image in list 0. The list 1 motion vector may indicate a spatial displacement between an image block of the current PU and a reference block in a reference image in list 1. If the current PU is bi-predicted, the video encoder may generate a list 0 motion vector and a list 1 motion vector for the current PU.

After generating one or more motion vectors for the current PU, the video encoder may generate predictive image blocks for the current PU (212). The video encoder may generate predictive image blocks for the current PU based on one or more reference blocks indicated by one or more motion vectors for the current PU.

In addition, the video encoder may generate a list of candidate predicted motion vectors for the current PU (213). The video decoder may generate a list of candidate prediction motion vectors for the current PU in various ways. For example, the video encoder may generate a list of candidate prediction motion vectors for the current PU according to one or more of the possible implementations described below with respect to FIGS. 8 to 12. In some feasible implementations, when the video encoder generates a list of candidate prediction motion vectors in the AMVP operation 210, the list of candidate prediction motion vectors may be limited to two candidate prediction motion vectors. In contrast, when a video encoder generates a list of candidate prediction motion vectors in a merge operation, the list of candidate prediction motion vectors may include more candidate prediction motion vectors (eg, five candidate prediction motion vectors).

After generating a list of candidate prediction motion vectors for the current PU, the video encoder may generate one or more motion vector differences (MVD) for each candidate prediction motion vector in the list of candidate prediction motion vectors (214). The video encoder may generate a motion vector difference for the candidate prediction motion vector by determining a difference between the motion vector indicated by the candidate prediction motion vector and a corresponding motion vector of the current PU.

If the current PU is unidirectionally predicted, the video encoder may generate a single MVD for each candidate prediction motion vector. If the current PU is bi-predicted, the video encoder may generate two MVDs for each candidate prediction motion vector. The first MVD may indicate a difference between the motion vector of the candidate prediction motion vector and the list 0 motion vector of the current PU. The second MVD may indicate a difference between the motion vector of the candidate prediction motion vector and the list 1 motion vector of the current PU.

The video encoder may select one or more of the candidate prediction motion vectors from the candidate prediction motion vector list (215). The video encoder may select one or more candidate prediction motion vectors in various ways. For example, a video encoder may select a candidate prediction motion vector with an associated motion vector that matches the motion vector to be encoded with minimal error, which may reduce the number of bits required to represent the motion vector difference for the candidate prediction motion vector.

After selecting one or more candidate prediction motion vectors, the video encoder may output one or more reference image indexes for the current PU, one or more candidate prediction motion vector indexes, and one or more selected candidate motion vectors. One or more motion vector differences of the predicted motion vector (216).

In examples where the current picture is associated with two reference picture lists (List 0 and List 1) and the current PU is unidirectionally predicted, the video encoder may output a reference picture index ("ref_idx_10") for List 0 or for Reference image index of list 1 ("ref_idx_11"). The video encoder may also output a candidate prediction motion vector index ("mvp_10_flag") indicating the position of the selected candidate prediction motion vector for the list 0 motion vector of the current PU in the candidate prediction motion vector list. Alternatively, the video encoder may output a candidate prediction motion vector index ("mvp_11_flag") indicating the position of the selected candidate prediction motion vector for the list 1 motion vector of the current PU in the candidate prediction motion vector list. The video encoder may also output a list 0 motion vector or a list 1 motion vector MVD for the current PU.

In the example where the current picture is associated with two reference picture lists (List 0 and List 1) and the current PU is bi-predicted, the video encoder may output the reference picture index ("ref_idx_10") for List 0 and the list Reference image index of 1 ("ref_idx_11"). The video encoder may also output a candidate prediction motion vector index ("mvp_10_flag") indicating the position of the selected candidate prediction motion vector for the list 0 motion vector of the current PU in the candidate prediction motion vector list. In addition, the video encoder may output a candidate prediction motion vector index ("mvp_11_flag") indicating the position of the selected candidate prediction motion vector for the list 1 motion vector of the current PU in the candidate prediction motion vector list. The video encoder may also output the MVD of the list 0 motion vector for the current PU and the MVD of the list 1 motion vector for the current PU.

FIG. 7 is an exemplary flowchart of motion compensation performed by a video decoder (such as video decoder 200) in an embodiment of the present application.

When the video decoder performs motion compensation operation 220, the video decoder may receive an indication of the selected candidate prediction motion vector for the current PU (222). For example, the video decoder may receive a candidate prediction motion vector index indicating the position of the selected candidate prediction motion vector within the candidate prediction motion vector list of the current PU.

If the motion information of the current PU is encoded using the AMVP mode and the current PU is bidirectionally predicted, the video decoder may receive the first candidate prediction motion vector index and the second candidate prediction motion vector index. The first candidate prediction motion vector index indicates the position of the selected candidate prediction motion vector for the list 0 motion vector of the current PU in the candidate prediction motion vector list. The second candidate prediction motion vector index indicates the position of the selected candidate prediction motion vector for the list 1 motion vector of the current PU in the candidate prediction motion vector list. In some feasible implementations, a single syntax element may be used to identify two candidate prediction motion vector indexes.

In addition, the video decoder may generate a list of candidate predicted motion vectors for the current PU (224). The video decoder may generate this candidate prediction motion vector list for the current PU in various ways. For example, the video decoder may use the techniques described below with reference to FIGS. 8 to 12 to generate a list of candidate prediction motion vectors for the current PU. When the video decoder generates a temporal candidate prediction motion vector for a candidate prediction motion vector list, the video decoder may explicitly or implicitly set a reference image index identifying a reference image including a co-located PU, as described above Figure 5 describes this.

After generating the candidate prediction motion vector list for the current PU, the video decoder may determine the current PU's based on the motion information indicated by one or more selected candidate prediction motion vectors in the candidate prediction motion vector list for the current PU. Motion information (225). For example, if the motion information of the current PU is encoded using a merge mode, the motion information of the current PU may be the same as the motion information indicated by the selected candidate prediction motion vector. If the motion information of the current PU is encoded using the AMVP mode, the video decoder may use one or more MVDs indicated in the one or more motion vectors and the code stream indicated by the or the selected candidate prediction motion vector. To reconstruct one or more motion vectors of the current PU. The reference image index and prediction direction identifier of the current PU may be the same as the reference image index and prediction direction identifier of the one or more selected candidate prediction motion vectors. After determining the motion information of the current PU, the video decoder may generate a predictive image block for the current PU based on one or more reference blocks indicated by the motion information of the current PU (226).

FIG. 8 is an exemplary schematic diagram of a coding unit (CU) and an adjacent position image block associated with the coding unit (CU) in the embodiment of the present application, illustrating CU250 and schematic candidate prediction motion vector positions 252A to 252E associated with CU250 . This application may collectively refer to the candidate prediction motion vector positions 252A to 252E as the candidate prediction motion vector positions 252. The candidate prediction motion vector position 252 indicates a spatial candidate prediction motion vector in the same image as the CU 250. The candidate prediction motion vector position 252A is positioned to the left of CU250. The candidate prediction motion vector position 252B is positioned above the CU250. The candidate prediction motion vector position 252C is positioned at the upper right of CU250. The candidate prediction motion vector position 252D is positioned at the lower left of CU250. The candidate prediction motion vector position 252E is positioned at the upper left of the CU250. FIG. 8 is a schematic embodiment of a manner for providing an inter prediction module 121 and a motion compensation module 162 to generate a list of candidate prediction motion vectors. The embodiments will be explained below with reference to the inter prediction module 121, but it should be understood that the motion compensation module 162 may implement the same technique and thus generate the same candidate prediction motion vector list.

FIG. 9 is an exemplary flowchart of constructing a candidate prediction motion vector list in an embodiment of the present application. The technique of FIG. 9 will be described with reference to a list including five candidate prediction motion vectors, but the techniques described herein may also be used with lists of other sizes. The five candidate prediction motion vectors may each have an index (eg, 0 to 4). The technique of FIG. 9 will be described with reference to a general video decoder. A general video decoder may be, for example, a video encoder (eg, video encoder 100) or a video decoder (eg, video decoder 200).

To reconstruct a list of candidate prediction motion vectors according to the embodiment of FIG. 9, the video decoder first considers four spatial candidate prediction motion vectors (902). The four spatial candidate prediction motion vectors may include candidate prediction

motion vector positions

252A, 252B, 252C, and 252D. The four spatial candidate prediction motion vectors correspond to motion information of four PUs in the same image as the current CU (for example, CU250). The video decoder may consider the four spatial candidate prediction motion vectors in the list in a particular order. For example, the candidate prediction motion vector position 252A may be considered first. If the candidate prediction motion vector position 252A is available, the candidate prediction motion vector position 252A may be assigned to index 0. If the candidate prediction motion vector position 252A is not available, the video decoder may not include the candidate prediction motion vector position 252A in the candidate prediction motion vector list. Candidate prediction motion vector positions may be unavailable for various reasons. For example, if the candidate prediction motion vector position is not within the current image, the candidate prediction motion vector position may not be available. In another feasible implementation, if the candidate prediction motion vector position is intra-predicted, the candidate prediction motion vector position may not be available. In another feasible implementation, if the candidate prediction motion vector position is in a slice different from the current CU, the candidate prediction motion vector position may not be available.

After considering the candidate prediction motion vector position 252A, the video decoder may next consider the candidate prediction motion vector position 252B. If the candidate prediction motion vector position 252B is available and different from the candidate prediction motion vector position 252A, the video decoder may add the candidate prediction motion vector position 252B to the candidate prediction motion vector list. In this particular context, the terms "same" and "different" refer to motion information associated with candidate predicted motion vector locations. Therefore, two candidate prediction motion vector positions are considered the same if they have the same motion information, and are considered different if they have different motion information. If the candidate prediction motion vector position 252A is not available, the video decoder may assign the candidate prediction motion vector position 252B to index 0. If the candidate prediction motion vector position 252A is available, the video decoder may assign the candidate prediction motion vector position 252 to index 1. If the candidate prediction motion vector position 252B is not available or the same as the candidate prediction motion vector position 252A, the video decoder skips the candidate prediction motion vector position 252B and does not include it in the candidate prediction motion vector list.

The candidate prediction motion vector position 252C is similarly considered by the video decoder for inclusion in the list. If the candidate prediction motion vector position 252C is available and not the same as the candidate prediction

motion vector positions

252B and 252A, the video decoder assigns the candidate prediction motion vector position 252C to the next available index. If the candidate prediction motion vector position 252C is unavailable or different from at least one of the candidate prediction

motion vector positions

252A and 252B, the video decoder does not include the candidate prediction motion vector position 252C in the candidate prediction motion vector list. Next, the video decoder considers the candidate prediction motion vector position 252D. If the candidate prediction motion vector position 252D is available and is not the same as the candidate prediction

motion vector position

252A, 252B, and 252C, the video decoder assigns the candidate prediction motion vector position 252D to the next available index. If the candidate prediction motion vector position 252D is unavailable or different from at least one of the candidate prediction

motion vector positions

252A, 252B, and 252C, the video decoder does not include the candidate prediction motion vector position 252D in the candidate prediction motion vector list. The above embodiments generally describe exemplarily considering candidate prediction motion vectors 252A to 252D for inclusion in the candidate prediction motion vector list, but in some embodiments, all candidate prediction motion vectors 252A to 252D may be first added to the candidate A list of predicted motion vectors, with duplicates removed from the list of candidate predicted motion vectors later.

After the video decoder considers the first four spatial candidate prediction motion vectors, the candidate prediction motion vector list may include four spatial candidate prediction motion vectors or the list may include less than four spatial candidate prediction motion vectors. If the list includes four spatial candidate prediction motion vectors (904, Yes), the video decoder considers temporal candidate prediction motion vectors (906). The temporal candidate prediction motion vector may correspond to motion information of a co-located PU of a picture different from the current picture. If a temporal candidate prediction motion vector is available and different from the first four spatial candidate prediction motion vectors, the video decoder assigns the temporal candidate prediction motion vector to index 4. If the temporal candidate prediction motion vector is not available or is the same as one of the first four spatial candidate prediction motion vectors, the video decoder does not include the temporal candidate prediction motion vector in the candidate prediction motion vector list. Therefore, after the video decoder considers temporal candidate prediction motion vectors (906), the candidate prediction motion vector list may include five candidate prediction motion vectors (the first four spatial candidate prediction motion vectors considered at block 902 and the The temporal candidate prediction motion vector) or may include four candidate prediction motion vectors (the first four spatial candidate prediction motion vectors considered at block 902). If the candidate prediction motion vector list includes five candidate prediction motion vectors (908, Yes), the video decoder completes building the list.

If the candidate prediction motion vector list includes four candidate prediction motion vectors (908, No), the video decoder may consider the fifth spatial candidate prediction motion vector (910). The fifth spatial candidate prediction motion vector may, for example, correspond to the candidate prediction motion vector position 252E. If the candidate prediction motion vector at position 252E is available and different from the candidate prediction motion vectors at

positions

252A, 252B, 252C, and 252D, the video decoder may add a fifth spatial candidate prediction motion vector to the candidate prediction motion vector list. The five-space candidate prediction motion vector is assigned to index 4. If the candidate prediction motion vector at position 252E is unavailable or different from the candidate prediction motion vector at

candidate position

252A, 252B, 252C, and 252D, the video decoder may not include the candidate prediction motion vector at position 252 in Candidate prediction motion vector list. So after considering the fifth spatial candidate prediction motion vector (910), the list may include five candidate prediction motion vectors (the first four spatial candidate prediction motion vectors considered at block 902 and the fifth spatial candidate prediction motion considered at block 910) Vector) or may include four candidate prediction motion vectors (the first four spatial candidate prediction motion vectors considered at block 902).

If the candidate prediction motion vector list includes five candidate prediction motion vectors (912, Yes), the video decoder finishes generating the candidate prediction motion vector list. If the candidate prediction motion vector list includes four candidate prediction motion vectors (912, No), the video decoder adds artificially generated candidate prediction motion vectors (914) until the list includes five candidate prediction motion vectors (916, Yes).

If the list includes fewer than four spatial candidate prediction motion vectors after the video decoder considers the first four spatial candidate prediction motion vectors (904, No), the video decoder may consider the fifth spatial candidate prediction motion vector (918). The fifth spatial candidate prediction motion vector may, for example, correspond to the candidate prediction motion vector position 252E. If the candidate prediction motion vector at position 252E is available and different from the candidate prediction motion vectors already included in the candidate prediction motion vector list, the video decoder may add a fifth spatial candidate prediction motion vector to the candidate prediction motion vector list, the The five-space candidate prediction motion vector is assigned to the next available index. If the candidate prediction motion vector at position 252E is unavailable or different from one of the candidate prediction motion vectors already included in the candidate prediction motion vector list, the video decoder may not include the candidate prediction motion vector at position 252E in Candidate prediction motion vector list. The video decoder may then consider the temporal candidate prediction motion vector (920). If a temporal candidate prediction motion vector is available and different from the candidate prediction motion vectors already included in the candidate prediction motion vector list, the video decoder may add the temporal candidate prediction motion vector to the candidate prediction motion vector list, the temporal candidate The predicted motion vector is assigned to the next available index. If the temporal candidate prediction motion vector is not available or is not different from one of the candidate prediction motion vectors already included in the candidate prediction motion vector list, the video decoder may not include the temporal candidate prediction motion vector in the candidate prediction motion vector. List.

If, after considering the fifth spatial candidate prediction motion vector (block 918) and the temporal candidate prediction motion vector (block 920), the candidate prediction motion vector list includes five candidate prediction motion vectors (922, Yes), the video decoder finishes generating List of candidate prediction motion vectors. If the list of candidate prediction motion vectors includes less than five candidate prediction motion vectors (922, No), the video decoder adds artificially generated candidate prediction motion vectors (914) until the list includes five candidate prediction motion vectors (916, Yes) until.

According to the technology of the present application, an additional merge candidate prediction motion vector may be artificially generated after the spatial candidate prediction motion vector and the temporal candidate prediction motion vector to fix the size of the merge candidate prediction motion vector list to a specified number of merge candidate prediction motion vectors (for example (Five of the previous possible implementations of FIG. 9). Additional merge candidate prediction motion vectors may include exemplary combined bi-predictive merge candidate prediction motion vectors (candidate prediction motion vector 1), scaled bi-directional predictive merge candidate prediction motion vectors (candidate prediction motion vector 2), and zero vectors Merge / AMVP candidate prediction motion vector (candidate prediction motion vector 3).

FIG. 10 is an exemplary schematic diagram of adding a combined candidate motion vector to a merge mode candidate prediction motion vector list in an embodiment of the present application. The combined bi-directional predictive merge candidate prediction motion vector may be generated by combining the original merge candidate prediction motion vector. Specifically, two candidate prediction motion vectors (which have mvL0 and refIdxL0 or mvL1 and refIdxL1) among the original candidate prediction motion vectors may be used to generate a bidirectional predictive merge candidate prediction motion vector. In FIG. 10, two candidate prediction motion vectors are included in the original merge candidate prediction motion vector list. The prediction type of one candidate prediction motion vector is List 0 unidirectional prediction, and the prediction type of the other candidate prediction motion vector is List 1 unidirectional prediction. In this feasible implementation, mvL0_A and ref0 are picked from list 0, and mvL1_B and ref0 are picked from list 1, and then a bidirectional predictive merge candidate prediction motion vector (which has mvL0_A and ref0 in list 0 and MvL1_B and ref0) in Listing 1 and check whether it is different from the candidate prediction motion vectors that have been included in the candidate prediction motion vector list. If it is different, the video decoder may include the bi-directional predictive merge candidate prediction motion vector in the candidate prediction motion vector list.

FIG. 11 is an exemplary schematic diagram of adding a scaled candidate motion vector to a merge mode candidate prediction motion vector list in an embodiment of the present application. The scaled bi-directional predictive merge candidate prediction motion vector may be generated by scaling the original merge candidate prediction motion vector. Specifically, a candidate prediction motion vector (which may have mvLX and refIdxLX) from the original candidate prediction motion vector may be used to generate a bidirectional predictive merge candidate prediction motion vector. In the feasible implementation of FIG. 11, two candidate prediction motion vectors are included in the original merge candidate prediction motion vector list. The prediction type of one candidate prediction motion vector is List 0 unidirectional prediction, and the prediction type of the other candidate prediction motion vector is List 1 unidirectional prediction. In this feasible implementation, mvL0_A and ref0 may be picked from list 0, and ref0 may be copied to the reference index ref0 ′ in list 1. Then, mvL0′_A may be calculated by scaling mvL0_A with ref0 and ref0 ′. The scaling may depend on the POC distance. Next, a bi-directional predictive merge candidate prediction motion vector (which has mvL0_A and ref0 in list 0 and mvL0'_A and ref0 'in list 1) can be generated and checked if it is a duplicate. If it is not duplicate, it can be added to the merge candidate prediction motion vector list.

FIG. 12 is an exemplary schematic diagram of adding a zero motion vector to a merge mode candidate prediction motion vector list in an embodiment of the present application. The zero vector merge candidate prediction motion vector may be generated by combining the zero vector with a reference index that can be referred to. If the zero vector candidate prediction motion vector is not duplicated, it can be added to the merge candidate prediction motion vector list. For each generated merge candidate prediction motion vector, the motion information may be compared with the motion information of the previous candidate prediction motion vector in the list.

In a feasible implementation manner, if the newly generated candidate prediction motion vector is different from the candidate prediction motion vector already included in the candidate prediction motion vector list, the generated candidate prediction motion vector is added to the merge candidate prediction motion vector. List. The process of determining whether the candidate prediction motion vector is different from the candidate prediction motion vector already included in the candidate prediction motion vector list is sometimes referred to as pruning. With pruning, each newly generated candidate prediction motion vector can be compared with existing candidate prediction motion vectors in the list. In some feasible implementations, the pruning operation may include comparing one or more new candidate prediction motion vectors with candidate prediction motion vectors already in the candidate prediction motion vector list and not adding as candidates already in the candidate prediction motion vector list. Repeated new candidate prediction motion vector for prediction motion vector. In other feasible implementations, the pruning operation may include adding one or more new candidate prediction motion vectors to a list of candidate prediction motion vectors and removing duplicate candidate prediction motion vectors from the list later. It should be understood that, in other feasible implementation manners, the foregoing trimming step may not be performed.

In the various feasible implementation manners shown in FIG. 5-7, FIG. 9-12, and the like, the spatial candidate prediction mode is exemplified from five positions 252A to 252E shown in FIG. . On the basis of the various feasible implementation manners shown in FIG. 5-7, FIG. 9-12, and the like, in some feasible implementation manners, the spatial candidate prediction mode may further include, for example, within a preset distance from the image block to be processed. , But not adjacent to the image block to be processed. Exemplarily, such positions may be shown as 252F to 252J in FIG. 13. It should be understood that FIG. 13 is an exemplary schematic diagram of a coding unit and an adjacent position image block associated with the coding unit in the embodiment of the present application. The positions described in the image blocks that are in the same image frame as the image block to be processed and that have been reconstructed when the image block to be processed is not adjacent to the image block to be processed are within the range of such positions.

FIG. 14 is an exemplary flowchart of a method for predicting motion information according to an embodiment of the present application. Specifically, the method includes the following steps:

S1401. Obtain at least two target pixel points having a preset positional relationship with an image block to be processed;

The target pixel point includes a first candidate pixel point adjacent to the image block to be processed and a second candidate pixel point located on the left side of the image block to be processed and not adjacent to the image block to be processed.

FIG. 8 schematically shows the adjacent positions 252A-252E of the coding unit 250, and FIG. 13 schematically shows the non-adjacent positions 252F-252J of the coding end element 250. It should be understood that the foregoing position may be used to indicate both an image block covering the position and a pixel point at the position. It should also be understood that the image block to be processed in the embodiment of the present application is a set of pixels to be processed, and is not limited to a coding unit, a coding subunit, or a prediction unit. Correspondingly, the coding unit 250 may be used as an embodiment in the present application. Of pending image blocks.

The left side of the to-be-processed image block includes the left side of the to-be-processed image block (such as the position corresponding to 252A), it can also include the upper left (such as the position corresponding to 252D), and it can also include the lower left (such as the corresponding position of 252E) s position).

In a feasible implementation manner, as shown in FIG. 15, it may be established that the position of the pixel point at the upper left vertex of the image block to be processed is set as the origin, and the straight line where the upper edge of the image block to be processed is located is The horizontal axis, the right direction is the horizontal positive direction, the straight line where the left edge of the image block to be processed is located is the vertical axis, and the vertical coordinate system is the vertical positive direction.

The position of the second candidate pixel point in the embodiment of the present invention may be at least one of the following coordinate points in the coordinate system: (-1, h × i-1 + h), (-1, h × i + h), (-w × i, h × j-1), (-w × i-1, h × j-1), (-w × i, h × j), (-w × i-1, h × j ), Where w and h are preset positive integers, i is a positive integer, and j is a non-negative integer.

In a feasible implementation manner, the position of the second candidate pixel point may not include the upper left position of the image block to be processed, that is, the position of the second candidate pixel point may be at least one of the following coordinate points in the coordinate system: (-1, h × i-1 + h), (-1, h × i + h), (-w × i, h × j + h-1), (-w × i-1, h × j + h-1), (-w × i, h × j), (-w × i-1, h × j), where w and h are preset positive integers, i is a positive integer, and j is non-negative Integer.

In another feasible implementation manner, after the motion information of the image block is determined, it is stored in a motion vector matrix for use in processing subsequent image blocks. For example, the entire frame of image can be corresponding to a pixel unit set with a 4x4 pixel point set as a pixel unit, and each 4x4 pixel point set corresponds to a motion information, and then the motion information corresponding to each 4x4 pixel point can be extracted to form a sum The motion information matrix corresponding to the original image. The motion information matrix can also be called a motion vector field. The above process is referred to as a motion vector field in the embodiment of the present application by sampling the motion information matrix corresponding to the image where the image block to be processed is located, where w is the sampling width interval of the motion vector field, and h is the The sampling height interval of the motion vector field. It should be understood that, in this embodiment, the determination of w is independent of the width of the image block to be processed, and the determination of h is independent of the height of the image block to be processed.

In a feasible implementation manner, this step includes: determining a coding unit of a previously reconstructed image block of the image block to be processed, which is located on the left side of the image block to be processed and is different from the image block to be processed Adjacent pixels.

Specifically, the pixel point at the lower left corner of the image block to be processed may be used as a reference point, and the straight line at the lower edge of the image block to be processed is used as a reference straight line. Determining one or more anchor points located to the left of the reference point, the anchor points being located on the reference straight line. Determine the coding unit (or prediction unit) where the anchor point is located, and target at least one of the adjacent point above the upper left corner point of the coding unit and the adjacent point above the upper right corner point of the coding unit as a target pixel.

A plurality of derived straight lines parallel to the reference straight line are sequentially determined according to a preset step size, and the derived straight lines are located below the image block to be processed. Taking the derived straight line as a new reference straight line, and taking the intersection of the derived straight line and the straight line at the left edge of the image block to be processed as a new reference point, repeating the steps of determining a target pixel point to obtain at least one new target pixel.

It should be understood that when the coding unit determined according to the new reference point and the coding unit determined according to the previous reference point are the same coding unit, the target pixel point is no longer obtained repeatedly.

It should also be understood that this embodiment determines the positions of at least two target pixel points having a preset position relationship with the image block to be processed, and the order of acquisition is based on the following description of the preset order.

It should also be understood that, in a feasible implementation manner, in order not to add a new line storage space, when the position of the target pixel point in the above embodiment is located above the image block to be processed, and the distance is When the top edge of the to-be-processed image block is located at a distance of 1 pixel or more from the straight line, the position of the target pixel is discarded.

In a feasible implementation manner, in order to save the storage space of the motion information, the position range of the second candidate pixel point needs to be limited. For example, limited to the above coordinate system, the abscissa of the second candidate pixel point cannot exceed one The boundary value, that is, w × i is less than or equal to the first threshold. Specifically, the first threshold is equal to a width of a coding tree unit CTU in which the image block to be processed is located, or the first threshold is equal to twice the width of the CTU.

It should be understood that the above-mentioned rectangular coordinate system is only for more clearly describing the position of the second candidate pixel point. In an actual embodiment, there is no step of establishing a rectangular coordinate system. It should also be understood that in order to more conveniently describe the position of the second candidate pixel point, various coordinate systems can be established with the positions of other pixel points as the origin and other straight lines as the coordinate axes, without limitation.

The embodiment of the present application does not limit the position of the first candidate pixel point adjacent to the image block to be processed. For example, it may be a point at any one or more positions indicated by 252A-252E in FIG. 8.

In a feasible implementation manner, there are a plurality of second candidate pixel points, and the acquiring at least two target pixel points having a preset position relationship with the image block to be processed includes: acquiring in accordance with the preset order A plurality of second candidate pixel points among the at least two target pixel points.

It should be understood that when variable-length encoding is performed on the index information of the second candidate pixel point, the order of acquisition is related to the bit consumption of the encoded index information, for example, the number of bits of the index information of the second candidate pixel point that is previously acquired is encoded. Less than or equal to the number of bits of the index information of the second candidate pixel point that is acquired later, that is, when the second candidate pixel point that was previously acquired corresponds to the target motion information, the binary of the target identification information The length of the representation is P. When the second candidate pixel point obtained later corresponds to the target motion information, the length of the binary representation of the target identification information is Q, and P is less than or equal to Q. Exemplarily, the binary representation may be an encoded codeword, and the length of the binary representation is the codeword length of the encoded codeword. The target identification information in the embodiments of the present application will be discussed later.

By way of example, FIG. 16 is a schematic diagram from right to left, which is obtained row by row from top to bottom, and is obtained one by one from the right to left in the row. FIG. 17 is a schematic diagram from top to bottom, which follows The order from right to left is obtained column by column, and the columns are obtained one by one from top to bottom. Figure 18 is a schematic diagram of the sequence from top right to bottom left. The numbers in the figure represent the order of acquisition. The higher the order of acquisition, the higher the number. The smaller. It should be understood that when the second candidate pixel point at a certain position is not acquired, the position is skipped, and the other positions are still obtained according to the smaller number and the higher the order.

It should be understood that the preset order in the embodiments of the present application is not limited to the above three orders, and for example, the order shown in FIG. 19 to FIG. 30 may also be included, which is not limited.

In a feasible implementation manner, the obtained position of the target pixel point or the motion information corresponding to the target pixel point is sent to a candidate motion information list. The specific implementation manner may be the same as the construction method of the candidate motion vector list used in the Merge or AMVP mode in the aforementioned H.265 standard technology.

In a feasible implementation manner, similar to the foregoing H.265 standard, a trimming operation is performed, that is, the acquiring at least two target pixel points having a preset position relationship with an image block to be processed includes: acquiring in sequence Candidate pixel points having the preset positional relationship with the image block to be processed; determining that the currently acquired motion information of the candidate pixel points is different from the acquired motion information of the target pixel points; The candidate pixel point of the motion information is used as the target pixel point. For a specific implementation method, refer to the description of the pruning operation in the foregoing, and will not be described again.

In a feasible implementation manner, the position of the target pixel point or the motion information corresponding to the target pixel point may be directly sent to the candidate motion information list, that is, no trimming operation is performed. In this embodiment, the obtained at least two Among the target pixels, there is a case where the motion information of at least two target pixels is the same, and there may also be a case where the motion information of any two target pixels is different.

In a feasible implementation manner, the number of the obtained target pixel points may be limited to a preset second threshold. The selection of the second threshold may be determined according to a specific implementation. For example, reasonably determine the second threshold to ensure that the number of target pixel points obtained is a fixed value, or to ensure that the number of target pixel points added to the candidate motion information list is a fixed value, or to ensure that The total amount of motion information is a fixed value.

S1402. Obtain target identification information.

The target identification information is used to determine target motion information from the motion information corresponding to the at least two target pixel points, and when the first candidate pixel point corresponds to the target motion information, the target identification information The length of the binary representation of the information is N. When the second candidate pixel corresponds to the target motion information, the length of the binary representation of the target identification information is M, and N is less than or equal to M.

The target identification information may be an index used to indicate each piece of motion information in the candidate motion information list, and different motion information is distinguished by different index numbers. Different index numbers have different binary representations, and the binary representations can be encoded codewords. In the embodiment of the present application, the length of the encoded codeword corresponding to the index number of the first candidate pixel point is less than or equal to the length of the encoded codeword corresponding to the index number of the second candidate pixel point.

S1403. Predict motion information of the image block to be processed according to the target motion information.

It should be understood that, compared with the Merge or AMVP technology in the H.265 standard, the embodiment of the present application adds new candidate motion information. Therefore, the embodiment of the present application can be used to improve the Merge technology or the AMVP technology.

Specifically, similar to the Merge technology, the target motion information may be used as the motion information of the image block to be processed. Similar to the AMVP technology, the target motion information may be used as a prediction value of the motion information of the image block to be processed. By combining the difference with the motion information, the motion information of the image block to be processed is obtained.

Specifically, the method may be used to decode an image block to be processed. The method further includes: analyzing a code stream to obtain target motion residual information; and correspondingly, predicting the image block to be processed based on the target motion information. The motion information includes: combining the target motion information and the target motion residual information to obtain motion information of the image block to be processed. Wherein, the motion information in the embodiment of the present application may refer to a motion vector. This step is to add the predicted value of the motion vector of the image block to be processed indicated by the target identification information and the residual value of the motion vector obtained by analysis to obtain the to-be-processed Image block motion vector. Correspondingly, the obtaining target identification information includes: parsing the code stream to obtain the target identification information.

Specifically, the method may be used to encode the target motion information; before obtaining the target identification information, the method further includes: determining a combination of the target motion information and the target motion residual information with the least encoding cost; and correspondingly, the acquiring The target identification information includes: obtaining identification information of the target motion information with the least coding cost among the at least two target motion information. The method further includes encoding the acquired target identification information and encoding the target motion residual information.

FIG. 31 is another exemplary flowchart of a motion information prediction method according to an embodiment of the present application. Specifically, the method includes the following steps:

S3101. Determine the availability of at least one target pixel having a preset positional relationship with the image block to be processed.

The target pixel point includes a candidate pixel point located on the left side of the image block to be processed and not adjacent to the image block to be processed, and when the prediction mode of the image block where the target pixel point is located is a frame During intra prediction, the target pixel point is unavailable.

It should be understood that the preset positional relationship may include an adjacent positional relationship with the image block to be processed and a non-adjacent positional relationship. For example, as shown in FIG. 8 and FIG. 13, respectively. In the embodiment shown in FIG. 14, the second candidate pixel point located on the left side of the image block to be processed and not adjacent to the image block to be processed is discussed in detail. In the embodiment of the present application, the target pixel point The second candidate pixel point in the embodiment shown in FIG. 14 is included, and details are not described herein again.

In a feasible implementation manner, the availability of the target pixel point may be determined, that is, the prediction mode of the image block corresponding to the coordinates of the target pixel point is checked.

In another feasible implementation manner, the availability of the image block where the target pixel point is located may be determined. In this embodiment, if the prediction mode of the image block is checked, the The coordinates indicate the prediction mode of the corresponding image block. This point can be the upper left corner point of the image block, the center point of the image block, or the target pixel point, which is not limited.

The condition for determining availability includes: when the prediction mode of the image block where the target pixel is located is intra prediction, the target pixel is unavailable. It should be understood that when the position of the target pixel is outside the edge of the image, or outside the edge of the band, the target pixel does not actually exist, or the value of the target pixel is derived from the derivation and cannot be measured. In a feasible implementation manner, the target pixel point is also considered unavailable.

In a feasible implementation manner, the position of the candidate pixel point includes: taking a position of a pixel point at an upper left vertex of the image block to be processed as an origin point, and a position where an upper edge of the image block to be processed is located The straight line is the horizontal axis, the right is the horizontal positive direction, the straight line where the left edge of the image block to be processed is located is the vertical axis, and the downward is at least one of the following coordinate points in the orthogonal coordinate system in the vertical positive direction. : (-1, h × i-1 + h), (-1, h × i + h), (-w × i, h × j-1), (-w × i-1, h × j- 1), (-w × i, h × j), (-w × i-1, h × j), where w and h are preset positive integers, i is a positive integer, and j is a non-negative integer.

The technical features of the various feasible implementation manners described above are similar to the technical features of the various feasible implementation manners in the embodiment shown in FIG. 14. For details, reference may be made to step 1401 in the embodiment shown in FIG. 14, and details are not described herein again.

S3102. Add the available motion information corresponding to the target pixel to the candidate motion information set of the image block to be processed.

S3103. Obtain target identification information.

The target identification information is used to determine target motion information from the candidate motion information set.

The target identification information may be an index used to indicate each piece of motion information in the candidate motion information list, and different motion information is distinguished by different index numbers.

S3104. Predict motion information of the image block to be processed according to the target motion information.

In a feasible implementation manner, the method is used to decode the image block to be processed, further comprising: analyzing a code stream to obtain target motion residual information; and correspondingly, predicting the target motion information based on the target motion information. The motion information of the image block to be processed includes: combining the target motion information and the target motion residual information to obtain the motion information of the image block to be processed. Correspondingly, the obtaining target identification information includes: parsing the code stream to obtain the target identification information.

In a feasible implementation manner, the method further includes encoding the acquired target identification information. The manner further includes encoding the target motion residual information.

The technical features of the various feasible implementation manners described above are similar to the technical features of the various feasible implementation manners in the embodiment shown in FIG. 14. For details, refer to the detailed description of step 1403 in the embodiment shown in FIG. 14.

FIG. 32 is an exemplary structural block diagram of a motion information prediction device 3200 according to an embodiment of the present application, and specifically includes the following modules:

An obtaining module 3201 is configured to obtain at least two target pixel points having a preset positional relationship with an image block to be processed, where the target pixel points include a first candidate pixel point adjacent to the image block to be processed and the target pixel point located at the to-be-processed image block. Processing a second candidate pixel point on the left side of the image block and not adjacent to the image block to be processed;

The indexing module 3202 is configured to obtain target identification information, where the target identification information is used to determine target motion information from motion information corresponding to the at least two target pixel points, and when the first candidate pixel point corresponds to the For target motion information, the length of the binary representation of the target identification information is N. When the second candidate pixel corresponds to the target motion information, the length of the binary representation of the target identification information is M, and N is less than or Equal to M

A calculation module 3203 is configured to predict motion information of the image block to be processed according to the target motion information.

In a feasible implementation manner, there are multiple second candidate pixels, and the obtaining module 2001 is specifically configured to obtain multiple second ones of the at least two target pixels in the preset order. Candidate pixel points, wherein when the previously obtained second candidate pixel point corresponds to the target motion information, the length of the binary representation of the target identification information is P, and when the later obtained second candidate pixel When the point corresponds to the target motion information, the length of the binary representation of the target identification information is Q, and P is less than or equal to Q;

In a feasible implementation manner, the obtaining module 3201 is specifically configured to sequentially obtain candidate pixel points having the preset positional relationship with the image block to be processed, and determine a motion of the currently obtained candidate pixel points. The information is different from the obtained motion information of the target pixel point; the candidate pixel point having different motion information is used as the target pixel point.

In a feasible implementation manner, the calculation module 3203 is specifically configured to use the target motion information as the motion information of the image block to be processed.

In a feasible implementation manner, the device 3200 is configured to decode the image block to be processed, and the indexing module 3202 is further configured to parse a code stream to obtain target motion residual information; correspondingly, the calculation module 3203 is specifically configured to: combine the target motion information and the target motion residual information to obtain motion information of the image block to be processed.

In a feasible implementation manner, the indexing module 3202 is specifically configured to: parse the code stream to obtain the target identification information.

In a feasible implementation manner, the device 3200 is configured to encode the image block to be processed, and the obtaining module 3201 is further configured to: determine a combination of target motion information and target motion residual information with a minimum coding cost; corresponding The indexing module 3202 is specifically configured to obtain identification information of the target motion information with the least coding cost among the at least two target motion information.

In a feasible implementation manner, the indexing module 3202 is further configured to encode the obtained target identification information.

In a feasible implementation manner, the indexing module 3202 is further configured to: encode the target motion residual information.

FIG. 33 is another exemplary structural block diagram of the motion information prediction device 3300 in the embodiment of the present application, and specifically includes the following modules:

A detection module 3301 is configured to determine availability of at least one target pixel point having a preset positional relationship with an image block to be processed, where the target pixel point is located on a left side of the image block to be processed and is related to the image block to be processed Non-adjacent candidate pixels, wherein when the prediction mode of the image block where the target pixel is located is intra prediction, the target pixel is unavailable;

An obtaining module 3302, configured to add available motion information corresponding to the target pixel point to a candidate motion information set of the image block to be processed;

An indexing module 3303, configured to obtain target identification information, where the target identification information is used to determine target motion information from the candidate motion information set;

A calculation module 3304 is configured to predict motion information of the image block to be processed according to the target motion information.

In a feasible implementation manner, the detection module 3301 is specifically configured to determine availability of an image block where the target pixel point is located.

In a feasible implementation manner, there are multiple candidate pixel points, and the multiple candidate pixel points are available. The obtaining module 3302 is specifically configured to: according to a preset order, use the multiple candidate pixel points that are available. The motion information corresponding to the points is added to the candidate motion information set of the image block to be processed, wherein when the previously obtained candidate pixel points correspond to the target motion information, the length of the binary representation of the target identification information is P When the candidate pixel points obtained later correspond to the target motion information, the length of the binary representation of the target identification information is Q, and P is less than or equal to Q.

In a feasible implementation manner, the obtaining module 3302 is specifically configured to sequentially obtain the available target pixel points; determine the currently obtained motion information of the available target pixel points and the The motion information in the candidate motion information set is different; the available target pixels with different motion information are added to the candidate motion information set of the image block to be processed.

In a feasible implementation manner, the calculation module 3304 is specifically configured to use the target motion information as the motion information of the image block to be processed.

In a feasible implementation manner, the device 3300 is configured to decode the image block to be processed, and the indexing module 3303 is further configured to parse a code stream to obtain target motion residual information; correspondingly, the calculation module 3104 is specifically configured to: combine the target motion information and the target motion residual information to obtain motion information of the image block to be processed.

In a feasible implementation manner, the indexing module 3303 is specifically configured to: parse the code stream to obtain the target identification information.

In a feasible implementation manner, the device 3300 is configured to encode the image block to be processed, and the obtaining module 3302 is further configured to: determine a combination of target motion information and target motion residual information with the least coding cost; corresponding The indexing module 3303 is specifically configured to obtain identification information of the target motion information with the least coding cost among the at least two target motion information.

In a feasible implementation manner, the indexing module 3303 is further configured to: encode the obtained target identification information.

In a feasible implementation manner, the indexing module 3303 is further configured to: encode the target motion residual information.

FIG. 34 is a schematic structural block diagram of a motion information prediction device 3400 in an embodiment of the present application. Specifically, it includes: a processor 3401 and a memory 3402 coupled to the processor; the processor 3401 is configured to execute the embodiment shown in FIG. 14 or FIG. 32 and various feasible implementation manners.

Although specific aspects of the application have been described with respect to video encoder 100 and video decoder 200, it should be understood that the techniques of this application may be implemented by many other video encoding and / or encoding units, processors, processing units, such as encoders / decoders (CODEC) hardware-based coding unit and the like. In addition, it should be understood that the steps shown and described with respect to FIGS. 14 and 32 are provided only as possible implementations. That is, the steps shown in the feasible embodiments of FIGS. 14 and 32 need not necessarily be performed in the order shown in FIGS. 14 and 32 and fewer, additional, or alternative steps may be performed.

In addition, it should be understood that depending on the feasible implementation, a particular action or event of any of the methods described herein may be performed in a different sequence, may be added, merged, or omitted together (e.g., not all described Actions or events are necessary for practical methods). Furthermore, in certain possible implementations, actions or events may be performed simultaneously, for example, via multi-threaded processing, interrupt processing, or multiple processors, rather than sequentially. In addition, although certain aspects of the present application are described as being performed by a single module or unit for clarity, it should be understood that the techniques of this application may be performed by a unit or combination of modules associated with a video decoder.

In one or more possible implementations, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over a computer-readable medium as one or more instructions or code, and executed by a hardware-based processing unit. The computer-readable medium may include a computer-readable storage medium or a communication medium, the computer-readable storage medium corresponding to a tangible medium such as a data storage medium, and the communication medium includes a computer program that facilitates, for example, transmission from one place to another according to a communication protocol Any media.

In this manner, computer-readable media may illustratively correspond to (1) non-transitory, tangible computer-readable storage media, or (2) a communication medium such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code, and / or data structures used to implement the techniques described in this application. The computer program product may include a computer-readable medium.

As a feasible implementation, without limitation, the computer-readable storage medium may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage devices, magnetic disk storage devices or other magnetic storage devices, flash memory, or may be used to store rendering instructions. Or any other medium in the form of a data structure with the desired code and accessible by a computer. Also, any connection is properly termed a computer-readable medium. For example, if a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technology such as infrared, radio, and microwave is used to transmit instructions from a website, server, or other remote source, then coaxial Cables, fiber optic cables, twisted pairs, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of media.

It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transient media, but are instead directed to non-transitory, tangible storage media. As used herein, magnetic disks and optical discs include compact discs (CDs), laser discs, optical discs, digital versatile discs (DVDs), flexible disks and Blu-ray discs, where disks typically reproduce data magnetically, and optical discs pass through The data is reproduced optically. Combinations of the above should also be included within the scope of computer-readable media.

Can be processed by one or more of, for example, one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), or other equivalent integrated or discrete logic circuits To execute instructions. Thus, as used herein, the term "processor" may refer to any of the aforementioned structures or any other structure suitable for implementing the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and / or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.

The techniques of this application can be implemented in a wide variety of devices or devices, including wireless handsets, integrated circuits (ICs), or collections of ICs (eg, chipset). Various components, modules, or units are described in this application to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily need to be implemented by different hardware units. More specifically, as described above, various units may be combined in a codec hardware unit or by interoperable hardware units (including one or more processors as described above) combined with appropriate software and / or firmware To provide.

The above description is only an exemplary specific implementation of the present application, but the scope of protection of the present application is not limited to this. Any person skilled in the art can easily think of changes or Replacement shall be covered by the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Claims

A method for predicting motion information of an image block, comprising:

Obtaining at least two target pixel points having a preset positional relationship with the image block to be processed, the target pixel point including a first candidate pixel point adjacent to the image block to be processed and located on a left side of the image block to be processed A second candidate pixel point that is not adjacent to the image block to be processed;

Acquiring target identification information, where the target identification information is used to determine target motion information from motion information corresponding to the at least two target pixel points, and when the first candidate pixel point corresponds to the target motion information, all The length of the binary representation of the target identification information is N. When the second candidate pixel corresponds to the target motion information, the length of the binary representation of the target identification information is M, and N is less than or equal to M;

Predicting motion information of the image block to be processed according to the target motion information.
The method of claim 1, wherein the binary representation of the target identification information includes a coded codeword of the target identification information.
The method according to claim 1 or 2, wherein the position of the second candidate pixel point comprises:

The position of the pixel at the upper left vertex of the image block to be processed is used as the origin, the straight line where the upper edge of the image block to be processed is located is the horizontal axis, and the right is the horizontal positive direction. The line where the left edge of the block is located is the vertical axis and downward is at least one of the following coordinate points in a rectangular coordinate system in the vertical positive direction: (-1, h × i-1 + h), (-1, h × i + h), (-w × i, h × j-1), (-w × i-1, h × j-1), (-w × i, h × j), (-w × i -1, h × j), where w and h are preset positive integers, i is a positive integer, and j is a non-negative integer.
The method according to claim 3, wherein w is a width of the image block to be processed, and h is a height of the image block to be processed.
The method according to claim 3, wherein the motion vector field is obtained by sampling a motion information matrix corresponding to an image in which the image block to be processed is located, w is a sampling width interval of the motion vector field, and h is The sampling height interval of the motion vector field.
The method according to any one of claims 3 to 5, wherein w × i is less than or equal to a first threshold.
The method according to claim 6, wherein the first threshold is equal to a width of a coding tree unit CTU in which the image block to be processed is located, or the first threshold is equal to twice the width of the CTU .
The method according to any one of claims 1 to 7, wherein there are a plurality of second candidate pixel points, and the acquiring at least two target pixel points having a preset position relationship with an image block to be processed, The method includes: obtaining a plurality of second candidate pixel points of the at least two target pixel points according to the preset order, wherein when the second candidate pixel point previously obtained corresponds to the target motion information, all the The length of the binary representation of the target identification information is P, and when the second candidate pixel point obtained later corresponds to the target motion information, the length of the binary representation of the target identification information is Q, and P is less than or equal to Q .
The method according to claim 8, wherein the preset order comprises: a short-to-long distance order, wherein the distance is a level of the second candidate pixel point in the rectangular coordinate system The sum of the absolute value of the coordinate and the absolute value of the vertical coordinate; or, the order from right to left; or, the order from top to bottom; or the order of the polyline type from top to bottom.
The method according to any one of claims 1 to 9, wherein, among the at least two target pixel points obtained, motion information of at least two target pixel points is the same.
The method according to any one of claims 1 to 10, wherein the acquiring at least two target pixel points having a preset position relationship with an image block to be processed comprises:

Sequentially obtaining candidate pixel points having the preset positional relationship with the image block to be processed;

Determining that the currently acquired motion information of the candidate pixel point is different from the acquired motion information of the target pixel point;

The candidate pixel points having different motion information are used as the target pixel points.
The method according to any one of claims 1 to 11, wherein the number of the obtained target pixel points is a preset second threshold.
The method according to any one of claims 1 to 12, wherein the predicting motion information of the image block to be processed according to the target motion information comprises:

Using the target motion information as the motion information of the image block to be processed.
The method according to any one of claims 1 to 13, wherein the method is used to decode the image block to be processed, further comprising:

Parse the code stream to obtain the target motion residual information;

Correspondingly, predicting the motion information of the image block to be processed according to the target motion information includes:

Combining the target motion information and the target motion residual information to obtain motion information of the image block to be processed.
The method according to claim 14, wherein the acquiring target identification information comprises:

Parse the code stream to obtain the target identification information.
The method according to any one of claims 1 to 15, wherein the method is configured to encode the image block to be processed, and before the acquiring target identification information, further comprising:

Determine the combination of target motion information and target motion residual information with the least coding cost;

Correspondingly, the obtaining target identification information includes:

The identification information of the target motion information with the least coding cost in the at least two target motion information is acquired.
The method according to claim 16, further comprising:

Encoding the acquired target identification information.
The method according to claim 16 or 17, further comprising:

Encoding the target motion residual information.
A method for predicting motion information of an image block, comprising:

Determining the availability of at least one target pixel point having a preset positional relationship with the image block to be processed, the target pixel point including candidate pixel points located on the left side of the image block to be processed and not adjacent to the image block , Wherein when the prediction mode of the image block where the target pixel is located is intra prediction, the target pixel is unavailable;

Adding the available motion information corresponding to the target pixel to the candidate motion information set of the image block to be processed;

Acquiring target identification information, where the target identification information is used to determine target motion information from the candidate motion information set;

Predicting motion information of the image block to be processed according to the target motion information.
The method according to claim 19, wherein the determining the availability of at least one target pixel point having a preset positional relationship with the image block to be processed comprises:

Determining the availability of the image block where the target pixel point is located.
The method according to claim 19 or 20, wherein the position of the candidate pixel point comprises:

The position of the pixel at the upper left vertex of the image block to be processed is used as the origin, the straight line where the upper edge of the image block to be processed is located is the horizontal axis, and the right is the horizontal positive direction. The line where the left edge of the block is located is the vertical axis and downward is at least one of the following coordinate points in a rectangular coordinate system in the vertical positive direction: (-1, h × i-1 + h), (-1, h × i + h), (-w × i, h × j-1), (-w × i-1, h × j-1), (-w × i, h × j), (-w × i -1, h × j), where w and h are preset positive integers, i is a positive integer, and j is a non-negative integer.
The method according to claim 21, wherein w is a width of the image block to be processed, and h is a height of the image block to be processed.
The method according to claim 21, wherein the motion vector field is obtained by sampling a motion information matrix corresponding to an image in which the image block to be processed is located, w is a sampling width interval of the motion vector field, and h is The sampling height interval of the motion vector field.
The method according to any one of claims 21 to 23, wherein w × i is less than or equal to a first threshold.
The method according to claim 24, wherein the first threshold is equal to a width of a coding tree unit CTU in which the image block to be processed is located, or the first threshold is equal to twice the width of the CTU .
The method according to any one of claims 19 to 25, wherein there are a plurality of candidate pixel points and the plurality of candidate pixel points are available, and the motion corresponding to the target pixel points that will be available The information adding to the candidate motion information set of the image block to be processed includes: adding motion information corresponding to a plurality of available candidate pixel points to the candidate motion information set of the image block to be processed in a preset order, where, when When the previously obtained candidate pixel points correspond to the target motion information, the length of the binary representation of the target identification information is P. When the later obtained candidate pixel points correspond to the target motion information, The length of the binary representation of the target identification information is Q, and P is less than or equal to Q.
The method according to any one of claims 19 to 26, wherein the binary representation of the target identification information includes an encoded codeword of the target identification information.
The method according to claim 26 or 27, wherein the preset order comprises: a short-to-long distance order, wherein the distance is a level of the candidate pixel point in the rectangular coordinate system The sum of the absolute value of the coordinate and the absolute value of the vertical coordinate; or, the order from right to left; or, the order from top to bottom; or the order of the polyline type from top to bottom.
The method according to any one of claims 19 to 28, wherein the candidate motion information set includes at least two identical motion information.
The method according to any one of claims 19 to 28, wherein adding the available motion information corresponding to the target pixel point to the candidate motion information set of the image block to be processed comprises:

Sequentially obtaining the available target pixels;

Determining that the currently acquired motion information of the available target pixel point is different from the motion information in the candidate motion information set of the image block to be processed;

The available target pixel points with different motion information are added to a candidate motion information set of the image block to be processed.
The method according to any one of claims 19 to 30, wherein the number of motion information in the candidate motion information set is less than or equal to a preset second threshold.
The method according to any one of claims 19 to 31, wherein the predicting motion information of the image block to be processed according to the target motion information comprises:

Using the target motion information as the motion information of the image block to be processed.
The method according to any one of claims 19 to 32, wherein the method is used to decode the image block to be processed, further comprising:

Parse the code stream to obtain the target motion residual information;

Correspondingly, predicting the motion information of the image block to be processed according to the target motion information includes:

Combining the target motion information and the target motion residual information to obtain motion information of the image block to be processed.
The method according to claim 33, wherein the acquiring target identification information comprises:

Parse the code stream to obtain the target identification information.
The method according to any one of claims 19 to 34, wherein the method is configured to encode the image block to be processed, and before the acquiring target identification information, further comprising:

Determine the combination of target motion information and target motion residual information with the least coding cost;

Correspondingly, the obtaining target identification information includes:

The identification information of the target motion information with the least coding cost in the at least two target motion information is acquired.
The method according to claim 35, further comprising:

Encoding the acquired target identification information.
The method according to claim 35 or 36, further comprising:

Encoding the target motion residual information.
A device for predicting motion information of an image block, comprising:

An obtaining module, configured to obtain at least two target pixel points having a preset positional relationship with the image block to be processed, where the target pixel points include a first candidate pixel point adjacent to the image block to be processed and the target pixel point located at the to-be-processed A second candidate pixel point on the left side of the image block that is not adjacent to the image block to be processed;

An indexing module is configured to obtain target identification information, where the target identification information is used to determine target motion information from motion information corresponding to the at least two target pixel points, and when the first candidate pixel point corresponds to the target For motion information, the length of the binary representation of the target identification information is N. When the second candidate pixel corresponds to the target motion information, the length of the binary representation of the target identification information is M, and N is less than or equal to M;

A calculation module is configured to predict the motion information of the image block to be processed according to the target motion information.
The apparatus according to claim 38, wherein the binary representation of the target identification information includes an encoded codeword of the target identification information.
The device according to claim 38 or 39, wherein the position of the second candidate pixel point comprises:

The position of the pixel at the upper left vertex of the image block to be processed is used as the origin, the straight line where the upper edge of the image block to be processed is located is the horizontal axis, and the right is the horizontal positive direction. The line where the left edge of the block is located is the vertical axis and downward is at least one of the following coordinate points in a rectangular coordinate system in the vertical positive direction: (-1, h × i-1 + h), (-1, h × i + h), (-w × i, h × j-1), (-w × i-1, h × j-1), (-w × i, h × j), (-w × i -1, h × j), where w and h are preset positive integers, i is a positive integer, and j is a non-negative integer.
The apparatus according to claim 40, wherein w is a width of the image block to be processed, and h is a height of the image block to be processed.
The apparatus according to claim 40, wherein the motion vector field is obtained by sampling a motion information matrix corresponding to an image where the image block to be processed is located, w is a sampling width interval of the motion vector field, and h is The sampling height interval of the motion vector field.
The device according to any one of claims 40 to 42, wherein w × i is less than or equal to a first threshold.
The apparatus according to claim 43, wherein the first threshold is equal to a width of a coding tree unit CTU in which the image block to be processed is located, or the first threshold is equal to twice the width of the CTU .
The device according to any one of claims 38 to 44, wherein the second candidate pixel point is multiple, and the obtaining module is specifically configured to obtain the at least two targets in the preset order. A plurality of second candidate pixels among the pixels, wherein when the previously obtained second candidate pixel corresponds to the target motion information, the length of the binary representation of the target identification information is P, and when the When the second candidate pixel point obtained later corresponds to the target motion information, the length of the binary representation of the target identification information is Q, and P is less than or equal to Q.
The apparatus according to claim 45, wherein the preset order comprises a short-to-long distance order, wherein the distance is a level of the second candidate pixel point in the rectangular coordinate system The sum of the absolute value of the coordinate and the absolute value of the vertical coordinate; or, the order from right to left; or, the order from top to bottom; or the order of the polyline type from top to bottom.
The device according to any one of claims 38 to 46, wherein motion information of at least two target pixels among the at least two target pixels obtained is the same.
The apparatus according to any one of claims 38 to 47, wherein the obtaining module is specifically configured to:

Sequentially obtaining candidate pixel points having the preset positional relationship with the image block to be processed;

Determining that the currently acquired motion information of the candidate pixel point is different from the acquired motion information of the target pixel point;

The candidate pixel points having different motion information are used as the target pixel points.
The device according to any one of claims 38 to 48, wherein the number of the obtained target pixel points is a preset second threshold.
The device according to any one of claims 38 to 49, wherein the calculation module is specifically configured to:

Using the target motion information as the motion information of the image block to be processed.
The apparatus according to any one of claims 38 to 50, wherein the apparatus is configured to decode the image block to be processed, and the indexing module is further configured to:

Parse the code stream to obtain the target motion residual information;

Correspondingly, the calculation module is specifically configured to:

Combining the target motion information and the target motion residual information to obtain motion information of the image block to be processed.
The apparatus according to claim 51, wherein the indexing module is specifically configured to:

Parse the code stream to obtain the target identification information.
The apparatus according to any one of claims 38 to 52, wherein the apparatus is configured to encode the image block to be processed, and the obtaining module is further configured to:

Determine the combination of target motion information and target motion residual information with the least coding cost;

Correspondingly, the index module is specifically configured to:

The identification information of the target motion information with the least coding cost in the at least two target motion information is acquired.
The apparatus according to claim 53, wherein the indexing module is further configured to:

Encoding the acquired target identification information.
The apparatus according to claim 53 or 54, wherein the indexing module is further configured to:

Encoding the target motion residual information.
A device for predicting motion information of an image block, comprising:

A detection module, configured to determine the availability of at least one target pixel point having a preset positional relationship with the image block to be processed, the target pixel point including being located on the left side of the image block to be processed and different from the image block to be processed Adjacent candidate pixels, wherein when the prediction mode of the image block where the target pixel is located is intra prediction, the target pixel is unavailable;

An acquisition module, configured to add available motion information corresponding to the target pixel point to a candidate motion information set of the image block to be processed;

An indexing module, configured to obtain target identification information, where the target identification information is used to determine target motion information from the candidate motion information set;

A calculation module is configured to predict the motion information of the image block to be processed according to the target motion information.
The device according to claim 56, wherein the detection module is specifically configured to:

Determining the availability of the image block where the target pixel point is located.
The device according to claim 56 or 57, wherein the position of the candidate pixel point comprises:

The position of the pixel at the upper left vertex of the image block to be processed is used as the origin, the straight line where the upper edge of the image block to be processed is located is the horizontal axis, and the right is the horizontal positive direction. The line where the left edge of the block is located is the vertical axis and downward is at least one of the following coordinate points in the orthogonal coordinate system in the vertical positive direction: (-1, h × i-1 + h), (-1, h × i + h), (-w × i, h × j-1), (-w × i-1, h × j-1), (-w × i, h × j), (-w × i -1, h × j), where w and h are preset positive integers, i is a positive integer, and j is a non-negative integer.
The apparatus according to claim 58, wherein w is a width of the image block to be processed, and h is a height of the image block to be processed.
The apparatus according to claim 58, wherein the motion vector field is obtained by sampling a motion information matrix corresponding to an image in which the image block to be processed is located, w is a sampling width interval of the motion vector field, and h is The sampling height interval of the motion vector field.
The device according to any one of claims 58 to 60, wherein w × i is less than or equal to a first threshold.
The apparatus according to claim 61, wherein the first threshold is equal to a width of a coding tree unit CTU in which the image block to be processed is located, or the first threshold is equal to twice the width of the CTU .
The device according to any one of claims 56 to 62, wherein the candidate pixel points are multiple and the multiple candidate pixel points are available, and the obtaining module is specifically configured to: The available motion information corresponding to the candidate pixel points is added to the candidate motion information set of the image block to be processed, and when the previously obtained candidate pixel points correspond to the target motion information, the target identifier The length of the binary representation of the information is P. When the candidate pixel points obtained later correspond to the target motion information, the length of the binary representation of the target identification information is Q, and P is less than or equal to Q.
The device according to any one of claims 56 to 63, wherein the binary representation of the target identification information includes an encoded codeword of the target identification information.
The apparatus according to claim 63 or 64, wherein the preset order comprises: a short-to-long distance order, wherein the distance is a level of the candidate pixel point in the rectangular coordinate system The sum of the absolute value of the coordinate and the absolute value of the vertical coordinate; or, the order from right to left; or, the order from top to bottom; or the order of the polyline type from top to bottom.
The apparatus according to any one of claims 56 to 65, wherein the candidate motion information set includes at least two identical motion information.
The device according to any one of claims 56 to 65, wherein the obtaining module is specifically configured to:

Sequentially obtaining the available target pixels;

Determining that the currently acquired motion information of the available target pixel point is different from the motion information in the candidate motion information set of the image block to be processed;

The available target pixel points with different motion information are added to a candidate motion information set of the image block to be processed.
The device according to any one of claims 56 to 67, wherein the number of motion information in the candidate motion information set is less than or equal to a preset second threshold.
The device according to any one of claims 56 to 68, wherein the calculation module is specifically configured to:

Using the target motion information as the motion information of the image block to be processed.
The apparatus according to any one of claims 56 to 69, wherein the apparatus is configured to decode the image block to be processed, and the indexing module is further configured to:

Parse the code stream to obtain the target motion residual information;

Correspondingly, the calculation module is specifically configured to:

Combining the target motion information and the target motion residual information to obtain motion information of the image block to be processed.
The apparatus according to claim 70, wherein the indexing module is specifically configured to:

Parse the code stream to obtain the target identification information.
The apparatus according to any one of claims 56 to 71, wherein the apparatus is configured to encode the image block to be processed, and the obtaining module is further configured to:

Determine the combination of target motion information and target motion residual information with the least coding cost;

Correspondingly, the index module is specifically configured to:

The identification information of the target motion information with the least coding cost in the at least two target motion information is acquired.
The apparatus according to claim 72, wherein the indexing module is further configured to:

Encoding the acquired target identification information.
The apparatus according to claim 72 or 73, wherein the index module is further configured to:

Encoding the target motion residual information.
A device for predicting motion information of an image block, comprising:

A processor and a memory coupled to the processor;

The processor is configured to execute the method according to any one of claims 1 to 55.
A computer-readable storage medium, characterized in that instructions are stored in the computer-readable storage medium, and when the instructions are run on a computer, the computer is caused to execute the method according to any one of claims 1 to 55.
A computer program product containing instructions, wherein when the instructions are run on a computer, the computer is caused to execute the method according to any one of claims 1 to 55.