WO2020024275A1 - Inter-frame prediction method and device - Google Patents
Inter-frame prediction method and device Download PDFInfo
- Publication number
- WO2020024275A1 WO2020024275A1 PCT/CN2018/098581 CN2018098581W WO2020024275A1 WO 2020024275 A1 WO2020024275 A1 WO 2020024275A1 CN 2018098581 W CN2018098581 W CN 2018098581W WO 2020024275 A1 WO2020024275 A1 WO 2020024275A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target
- motion information
- image block
- processed
- candidate
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/513—Processing of motion vectors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/513—Processing of motion vectors
- H04N19/517—Processing of motion vectors by encoding
- H04N19/52—Processing of motion vectors by encoding by predictive encoding
Definitions
- the present application relates to the technical field of video images, and in particular, to a method and a device for inter prediction.
- Digital video capabilities can be incorporated into a wide range of devices, including digital television, digital live broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, e-books Reader, digital camera, digital recording device, digital media player, video game device, video game console, cellular or satellite radio telephone, video conference device, video streaming device, etc.
- Digital video equipment implements video compression technologies, such as advanced video coding (AVC), ITU-TH.263, ITU-TH.264 / MPEG-4 Part 10.
- TH.265 high-efficiency video coding (HEVC) standard defines standards and those video compression technologies described in the extensions of the standard to more efficiently transmit and receive digital video information.
- Video devices can implement these video codec technologies to more efficiently transmit, receive, encode, decode, and / or store digital video information.
- Video compression techniques perform spatial (intra-image) prediction and / or temporal (inter-image) prediction to reduce or remove redundancy inherent in video sequences.
- a video block may be divided into video blocks, and a video block may also be referred to as a tree block, a coding unit (coding unit, CU), and / or a decoding node.
- Video blocks in an intra-coded (I) slice of an image are encoded using spatial prediction about reference samples in neighboring blocks in the same image.
- Video blocks in an inter-decoded (P or B) slice of an image may use spatial predictions about reference samples in neighboring blocks in the same image or temporal predictions about reference samples in other reference images.
- An image may be referred to as a frame, and a reference image may be referred to as a reference frame.
- the embodiments of the present application provide a method and an apparatus for inter prediction. Selecting suitable candidate motion information as a motion information prediction value of an image block to be processed improves the effectiveness of motion information prediction and the encoding and decoding efficiency.
- the motion information includes motion vectors and index information of a reference frame pointed to by the motion vectors, and the like.
- the prediction of the motion information refers to the prediction of a motion vector.
- a method for predicting motion information of an image block including: obtaining at least two target pixel points having a preset position relationship with an image block to be processed, where the target pixel points include A first candidate pixel point adjacent to the image block to be processed and a second candidate pixel point located on the left side of the image block to be processed and not adjacent to the image block to be processed; obtaining target identification information, the target The identification information is used to determine target motion information from the motion information corresponding to the at least two target pixel points, and when the first candidate pixel point corresponds to the target motion information, a binary representation of the target identification information The length is N. When the second candidate pixel corresponds to the target motion information, the length of the binary representation of the target identification information is M, and N is less than or equal to M; according to the target motion information, predicting the Process the motion information of the image block.
- the beneficial effect of this implementation mode is that motion information of a non-adjacent image block on the left side of a block to be processed is used as candidate motion information of the block to be processed, and more spatially prior coding information is used to improve coding performance.
- the binary representation of the target identification information includes an encoded codeword of the target identification information.
- the beneficial effect of this implementation mode is that when the candidate prediction motion information is expressed in a variable-length encoding method, the motion information in the upper order will be encoded with a shorter codeword, and the motion information in the lower order will use a longer codeword coding. According to the correlation between the motion information of the target pixel and the motion information of the image block to be processed, the proper determination of the acquisition order of the target pixel is helpful for selecting a better codeword encoding strategy and improving the encoding performance.
- the position of the second candidate pixel point includes: using a position of a pixel point at an upper left vertex of the image block to be processed as an origin, and using an upper position of the image block to be processed
- the line where the edge is located is the horizontal axis, the right is the horizontal positive direction, the line where the left edge of the image block to be processed is located is the vertical axis, and the downward is the following coordinate points in the orthogonal coordinate system in the vertical positive direction
- the beneficial effect of this implementation mode is that it provides multiple possibilities for the selection of the second candidate pixel point according to the actual coding requirements, and can achieve a balance between performance, complexity, and software and hardware consumption.
- w is the width of the image block to be processed
- h is the height of the image block to be processed.
- the beneficial effect of this implementation mode is that the position of the second candidate pixel point is selected according to the size of the image block to be processed, which is consistent with the local motion characteristics of the image block to be processed, making the selection more reasonable.
- the motion vector field is obtained by sampling a motion information matrix corresponding to an image where the image block to be processed is located, w is a sampling width interval of the motion vector field, and h is the motion vector.
- the sampling height interval of the field is obtained by sampling a motion information matrix corresponding to an image where the image block to be processed is located, w is a sampling width interval of the motion vector field, and h is the motion vector. The sampling height interval of the field.
- the beneficial effect of this implementation mode is that the selection of the position of the second candidate pixel point and the distribution of the motion information of the motion vector field are kept consistent, and the balance of position selection is ensured.
- w ⁇ i is less than or equal to the first threshold.
- the first threshold value is equal to a width of a coding tree unit CTU where the image block to be processed is located, or the first threshold value is equal to twice the width of the CTU.
- This embodiment has the beneficial effect of limiting the selection range of the position of the second candidate pixel point, and ensuring the balance between the coding performance and the storage space.
- the acquiring at least two target pixel points having a preset position relationship with the image block to be processed includes: acquiring in accordance with the preset order A plurality of second candidate pixel points of the at least two target pixel points, wherein when the previously obtained second candidate pixel points correspond to the target motion information, the length of the binary representation of the target identification information Is P, when the second candidate pixel point obtained later corresponds to the target motion information, the length of the binary representation of the target identification information is Q, and P is less than or equal to Q.
- the preset order includes a short-to-long distance order, where the distance is an absolute value of a horizontal coordinate of the second candidate pixel point in the rectangular coordinate system and The sum of the absolute values of the vertical coordinates; or, the order from right to left; or, the order from top to bottom; or the polyline order from top to bottom.
- the distance is a length of a straight line segment connecting the second candidate pixel point and a pixel point at a vertex position of a lower left corner of the image block to be processed.
- the beneficial effect of this implementation mode is that when the variable length coding method is adopted for the representation method of the motion information corresponding to each second candidate pixel point, the motion information in the upper order will be encoded with a shorter codeword, and the motion information in the lower order will be used. Longer codeword encoding. According to the correlation between the motion information of the second candidate pixel point and the motion information of the image block to be processed, appropriately determining the acquisition order is beneficial to selecting a better codeword coding strategy and improving coding performance.
- motion information of at least two target pixel points is the same.
- the beneficial effect of this implementation mode is that the trimming operation is not performed when constructing the candidate motion information list, which saves complexity.
- the acquiring at least two target pixel points having a preset position relationship with an image block to be processed includes: sequentially obtaining candidates having the preset position relationship with the image block to be processed. A pixel point; determining that the currently acquired motion information of the candidate pixel point is different from the acquired motion information of the target pixel point; and using the candidate pixel point with different motion information as the target pixel point.
- the beneficial effect of this implementation mode is that redundant information in the candidate motion information list is removed through a pruning operation, and encoding efficiency is improved.
- the number of the obtained target pixel points is a preset second threshold.
- the beneficial effect of this implementation mode is that the number of target pixel points is acquired, encoding performance and software and hardware consumption are balanced, and in some specific implementation modes, the instability of the decoding system caused by the uncertain total number of candidate motion information lists is also avoided.
- the predicting the motion information of the image block to be processed according to the target motion information includes: using the target motion information as the motion information of the image block to be processed.
- the method is used to decode the image block to be processed, further comprising: analyzing a code stream to obtain target motion residual information; and correspondingly, predicting the target motion information based on the target motion information.
- the motion information of the image block to be processed includes: combining the target motion information and the target motion residual information to obtain the motion information of the image block to be processed.
- the obtaining target identification information includes: parsing the code stream to obtain the target identification information.
- the method is used to encode the image block to be processed, and before the obtaining the target identification information, the method further includes: determining a combination of target motion information and target motion residual information with the least coding cost.
- the obtaining target identification information includes: obtaining identification information of the target motion information with the least coding cost among the at least two target motion information.
- the obtained target identification information is encoded.
- the target motion residual information is encoded.
- the above various feasible embodiments apply the motion vector prediction method in the present application to a decoding method and an encoding method for obtaining motion vectors of an image block to be processed, a merge prediction mode (Merge) and an advanced motion vector prediction mode (advanced motion vector prediction).
- AMVP advanced motion vector prediction mode
- a method for predicting motion information of an image block including: determining availability of at least one target pixel point having a preset position relationship with an image block to be processed, the target pixel point Including candidate pixel points located on the left side of the image block to be processed and not adjacent to the image block to be processed, wherein when the prediction mode of the image block where the target pixel point is located is intra prediction, the target Pixels are unavailable; motion information corresponding to the available target pixel is added to the candidate motion information set of the image block to be processed; target identification information is obtained, and the target identification information is used from the candidate motion information set Determining target motion information; and predicting motion information of the image block to be processed according to the target motion information.
- the beneficial effect of this implementation mode is that motion information of a non-adjacent image block on the left side of a block to be processed is used as candidate motion information of the block to be processed, and more spatially prior coding information is used to improve coding performance.
- the determining the availability of at least one target pixel point having a preset position relationship with the image block to be processed includes determining the availability of the image block where the target pixel point is located.
- the judgment of availability will be based on factors such as the prediction mode of the image block where the target pixel is located, whether the target pixel is within the image region, whether the motion vector corresponding to the position indicated by the target pixel is necessarily the same as the motion vector corresponding to other positions ( For example, in the H.265 standard, for the rectangular block mode, the candidate prediction block of the Merge mode is determined).
- the prediction mode of the image block where the target pixel point is located is inter prediction
- the target pixel point is available, but when the target pixel point is located at the image block to be processed When the image is outside the edge of the image or the edge of the strip, the target pixel is also unavailable.
- the position of the candidate pixel point includes: taking a position of a pixel point at an upper left vertex of the image block to be processed as an origin point, and taking an upper edge of the image block to be processed to be located
- the straight line is the horizontal axis
- the right is the horizontal positive direction
- the straight line where the left edge of the image block to be processed is located is the vertical axis
- the downward is at least one of the following coordinate points in the orthogonal coordinate system in the vertical positive direction.
- the beneficial effect of this implementation mode is that it provides multiple possibilities for the selection of candidate pixels according to actual coding requirements, and can achieve a balance between performance, complexity, and software and hardware consumption.
- w is the width of the image block to be processed
- h is the height of the image block to be processed.
- the beneficial effect of this implementation mode is that the positions of candidate pixel points are selected according to the size of the image block to be processed, which conforms to the local motion characteristics of the image block to be processed, making the selection more reasonable.
- the motion vector field is obtained by sampling a motion information matrix corresponding to an image where the image block to be processed is located, w is a sampling width interval of the motion vector field, and h is the motion vector.
- the sampling height interval of the field is obtained by sampling a motion information matrix corresponding to an image where the image block to be processed is located, w is a sampling width interval of the motion vector field, and h is the motion vector. The sampling height interval of the field.
- the beneficial effect of this implementation mode is that the selection of the positions of the candidate pixels is consistent with the distribution of the motion information of the motion vector field, and the balance of position selection is ensured.
- w ⁇ i is less than or equal to the first threshold.
- the first threshold value is equal to a width of a coding tree unit CTU where the image block to be processed is located, or the first threshold value is equal to twice the width of the CTU.
- the beneficial effect of this implementation mode is to limit the selection range of candidate pixel positions, and ensure the balance between encoding performance and storage space.
- the candidate motion information set includes: adding motion information corresponding to a plurality of available candidate pixel points to the candidate motion information set of the image block to be processed according to a preset order, wherein when the previously obtained candidate pixel points When corresponding to the target motion information, the length of the binary representation of the target identification information is P. When the candidate pixel points obtained later correspond to the target motion information, the length of the binary representation of the target identification information is Q, P is less than or equal to Q.
- the binary representation of the target identification information includes an encoded codeword of the target identification information.
- the preset order includes a short-to-long distance order, wherein the distance is a horizontal coordinate absolute value and a vertical value of the candidate pixel point in the rectangular coordinate system.
- the distance is a length of a straight line segment connecting the second candidate pixel point and a pixel point at a vertex position of a lower left corner of the image block to be processed.
- the beneficial effect of this implementation mode is that when the variable-length coding method is used for the representation of the motion information corresponding to each candidate pixel point, the motion information in the upper order will be encoded with a shorter codeword, and the motion information in the lower order will be used for a longer time.
- Codeword encoding According to the correlation between the motion information of the candidate pixels and the motion information of the image block to be processed, the proper determination of the acquisition order is helpful for selecting a better codeword encoding strategy and improving the encoding performance.
- the candidate motion information set includes at least two identical motion information.
- the beneficial effect of this implementation mode is that the trimming operation is not performed when constructing the candidate motion information list, which saves complexity.
- adding the available motion information corresponding to the target pixel point to the candidate motion information set of the image block to be processed includes: sequentially obtaining the available target pixel point; determining a current The obtained motion information of the available target pixel points is different from the motion information in the candidate motion information set of the image block to be processed; adding the available target pixel points with different motion information to the image block to be processed Candidate motion information set.
- the beneficial effect of this implementation mode is that redundant information in the candidate motion information list is removed through a pruning operation, and encoding efficiency is improved.
- the number of motion information in the candidate motion information set is less than or equal to a preset second threshold.
- the beneficial effect of this implementation mode is that the number of target pixel points is acquired, encoding performance and software and hardware consumption are balanced, and in some specific implementation modes, the instability of the decoding system caused by the uncertain total number of candidate motion information lists is also avoided.
- the predicting the motion information of the image block to be processed according to the target motion information includes: using the target motion information as the motion information of the image block to be processed.
- the method is used to decode the image block to be processed, further comprising: analyzing a code stream to obtain target motion residual information; and correspondingly, predicting the target motion information based on the target motion information.
- the motion information of the image block to be processed includes: combining the target motion information and the target motion residual information to obtain the motion information of the image block to be processed.
- the obtaining target identification information includes: parsing the code stream to obtain the target identification information.
- the method is used to encode the image block to be processed, and before the obtaining the target identification information, the method further includes: determining a combination of target motion information and target motion residual information with the least coding cost.
- the obtaining target identification information includes: obtaining identification information of the target motion information with the least coding cost among the at least two target motion information.
- the method further includes: encoding the obtained target identification information.
- the method further includes: encoding the target motion residual information.
- the above various feasible embodiments apply the motion vector prediction method in the present application to a decoding method and an encoding method for obtaining motion vectors of an image block to be processed, and the combined prediction mode and advanced motion vector prediction mode improve the encoding performance of the original method. And efficiency.
- a device for predicting motion information including: an acquisition module, configured to acquire at least two target pixel points having a preset position relationship with an image block to be processed, the target pixels The points include a first candidate pixel point adjacent to the image block to be processed and a second candidate pixel point located on the left side of the image block to be processed and not adjacent to the image block to be processed; an index module, configured to obtain Target identification information for determining target motion information from motion information corresponding to the at least two target pixel points, wherein when the first candidate pixel point corresponds to the target motion information, the The length of the binary representation of the target identification information is N. When the second candidate pixel corresponds to the target motion information, the length of the binary representation of the target identification information is M, and N is less than or equal to M; the calculation module uses Based on the target motion information, predicting motion information of the image block to be processed.
- the binary representation of the target identification information includes an encoded codeword of the target identification information.
- the position of the second candidate pixel point includes: using a position of a pixel point at an upper left vertex of the image block to be processed as an origin, and using an upper position of the image block to be processed
- the line where the edge is located is the horizontal axis, the right is the horizontal positive direction, the line where the left edge of the image block to be processed is located is the vertical axis, and the downward is the following coordinate points in the orthogonal coordinate system in the vertical positive direction
- w is the width of the image block to be processed
- h is the height of the image block to be processed.
- the motion vector field is obtained by sampling a motion information matrix corresponding to an image where the image block to be processed is located, w is a sampling width interval of the motion vector field, and h is the motion vector.
- the sampling height interval of the field is obtained by sampling a motion information matrix corresponding to an image where the image block to be processed is located, w is a sampling width interval of the motion vector field, and h is the motion vector. The sampling height interval of the field.
- w ⁇ i is less than or equal to the first threshold.
- the first threshold value is equal to a width of a coding tree unit CTU where the image block to be processed is located, or the first threshold value is equal to twice the width of the CTU.
- the obtaining module is specifically configured to obtain multiple second candidates among the at least two target pixels in the preset order.
- a pixel wherein when the previously obtained second candidate pixel corresponds to the target motion information, the length of the binary representation of the target identification information is P, and when the later obtained second candidate pixel When corresponding to the target motion information, the length of the binary representation of the target identification information is Q, and P is less than or equal to Q;
- the preset order includes a short-to-long distance order, where the distance is an absolute value of a horizontal coordinate of the second candidate pixel point in the rectangular coordinate system and The sum of the absolute values of the vertical coordinates; or, the order from right to left; or, the order from top to bottom; or the polyline order from top to bottom.
- the distance is a length of a straight line segment connecting the second candidate pixel point and a pixel point at a vertex position of a lower left corner of the image block to be processed.
- motion information of at least two target pixel points is the same.
- the obtaining module is specifically configured to sequentially obtain candidate pixel points having the preset positional relationship with the image block to be processed; and determine the currently acquired motion information of the candidate pixel points. Different from the obtained motion information of the target pixel point; the candidate pixel point having different motion information is used as the target pixel point.
- the number of the obtained target pixel points is a preset second threshold.
- the calculation module is specifically configured to use the target motion information as the motion information of the image block to be processed.
- the device is configured to decode the image block to be processed, and the indexing module is further configured to: parse a code stream to obtain target motion residual information; correspondingly, the calculation module is specifically used It is: combining the target motion information and the target motion residual information to obtain motion information of the image block to be processed.
- the indexing module is specifically configured to: parse the code stream to obtain the target identification information.
- the device is configured to encode the image block to be processed, and the obtaining module is further configured to determine a combination of target motion information and target motion residual information with the least coding cost; correspondingly, The indexing module is specifically configured to obtain identification information of the target motion information with the least coding cost among the at least two target motion information.
- the indexing module is further configured to: encode the obtained target identification information.
- the indexing module is further configured to: encode the target motion residual information.
- a device for predicting motion information including: a detection module configured to determine availability of at least one target pixel point having a preset position relationship with an image block to be processed, the target The pixel points include candidate pixel points that are located on the left side of the image block to be processed and are not adjacent to the image block to be processed.
- an acquisition module is configured to add the available motion information corresponding to the target pixel to the candidate motion information set of the image block to be processed;
- an index module is configured to acquire target identification information, the target The identification information is used to determine target motion information from the candidate motion information set;
- a calculation module is configured to predict motion information of the image block to be processed according to the target motion information.
- the detection module is specifically configured to determine availability of an image block where the target pixel point is located.
- the position of the candidate pixel point includes: taking a position of a pixel point at an upper left vertex of the image block to be processed as an origin point, and a position where an upper edge of the image block to be processed is located
- the straight line is the horizontal axis
- the right is the horizontal positive direction
- the straight line where the left edge of the image block to be processed is located is the vertical axis
- the downward is at least one of the following coordinate points in the orthogonal coordinate system in the vertical positive direction : (-1, h ⁇ i-1 + h), (-1, h ⁇ i + h), (-w ⁇ i, h ⁇ j-1), (-w ⁇ i-1, h ⁇ j- 1), (-w ⁇ i, h ⁇ j), (-w ⁇ i-1, h ⁇ j), where w and h are preset positive integers, i is a positive integer, and j is a non-negative integer.
- w is the width of the image block to be processed
- h is the height of the image block to be processed.
- the motion vector field is obtained by sampling a motion information matrix corresponding to an image where the image block to be processed is located, w is a sampling width interval of the motion vector field, and h is the motion vector.
- the sampling height interval of the field is obtained by sampling a motion information matrix corresponding to an image where the image block to be processed is located, w is a sampling width interval of the motion vector field, and h is the motion vector. The sampling height interval of the field.
- w ⁇ i is less than or equal to the first threshold.
- the first threshold value is equal to a width of a coding tree unit CTU where the image block to be processed is located, or the first threshold value is equal to twice the width of the CTU.
- the obtaining module is specifically configured to: according to a preset order, use the plurality of candidate pixel points that are available.
- the corresponding motion information is added to the candidate motion information set of the image block to be processed, wherein when the previously obtained candidate pixel points correspond to the target motion information, the length of the binary representation of the target identification information is P, When the candidate pixel points obtained later correspond to the target motion information, the length of the binary representation of the target identification information is Q, and P is less than or equal to Q.
- the binary representation of the target identification information includes an encoded codeword of the target identification information.
- the preset order includes a short-to-long distance order, where the distance is a horizontal coordinate absolute value and a vertical value of a second candidate pixel point in the rectangular coordinate system.
- the distance is a length of a straight line segment connecting the second candidate pixel point and a pixel point at a vertex position of a lower left corner of the image block to be processed.
- the candidate motion information set includes at least two identical motion information.
- the acquiring module is specifically configured to: sequentially acquire the available target pixel points; determine the currently acquired motion information of the available target pixel points and candidates for the image block to be processed The motion information in the motion information set is different; the available target pixels with different motion information are added to the candidate motion information set of the image block to be processed.
- the number of motion information in the candidate motion information set is less than or equal to a preset second threshold.
- the calculation module is specifically configured to use the target motion information as the motion information of the image block to be processed.
- the device is configured to decode the image block to be processed, and the indexing module is further configured to: parse a code stream to obtain target motion residual information; correspondingly, the calculation module is specifically used It is: combining the target motion information and the target motion residual information to obtain motion information of the image block to be processed.
- the indexing module is specifically configured to: parse the code stream to obtain the target identification information.
- the device is configured to encode the image block to be processed, and the obtaining module is further configured to determine a combination of target motion information and target motion residual information with the least coding cost; correspondingly, The indexing module is specifically configured to obtain identification information of the target motion information with the least coding cost among the at least two target motion information.
- the indexing module is further configured to: encode the obtained target identification information.
- the indexing module is further configured to: encode the target motion residual information.
- a prediction device that provides motion information, including: a processor and a memory coupled to the processor; and the processor is configured to execute the first or second aspect described above. The method described.
- a computer-readable storage medium stores instructions, and when the instructions are run on a computer, the computer is caused to execute the foregoing first aspect or The method described in the second aspect.
- a computer program product containing instructions is provided, and when the instructions are run on a computer, the computer is caused to execute the method described in the first aspect or the second aspect above.
- FIG. 1 is a schematic block diagram of a video encoding and decoding system according to an embodiment of the present application
- FIG. 2 is a schematic block diagram of a video encoder according to an embodiment of the present application.
- FIG. 3 is a schematic block diagram of a video decoder according to an embodiment of the present application.
- FIG. 4 is a schematic block diagram of an inter prediction module according to an embodiment of the present application.
- FIG. 5 is an exemplary flowchart of a merge prediction mode according to an embodiment of the present application.
- FIG. 6 is an exemplary flowchart of an advanced motion vector prediction mode according to an embodiment of the present application.
- FIG. 7 is an exemplary flowchart of motion compensation performed by a video decoder in an embodiment of the present application.
- FIG. 8 is an exemplary schematic diagram of a coding unit and an adjacent position image block associated with the coding unit in the embodiment of the present application;
- FIG. 9 is an exemplary flowchart of constructing a candidate prediction motion vector list in an embodiment of the present application.
- FIG. 10 is an exemplary schematic diagram of adding a combined candidate motion vector to a merge mode candidate prediction motion vector list in an embodiment of the present application
- FIG. 11 is an exemplary schematic diagram of adding a scaled candidate motion vector to a merge mode candidate prediction motion vector list in an embodiment of the present application
- FIG. 12 is an exemplary schematic diagram of adding a zero motion vector to a merge mode candidate prediction motion vector list in an embodiment of the present application
- FIG. 13 is another exemplary schematic diagram of a coding unit and an adjacent position image block associated with the coding unit in the embodiment of the present application;
- FIG. 14 is an exemplary flowchart of a method for predicting motion information according to an embodiment of the present application.
- 15 is an exemplary schematic diagram of an image block to be processed and an image block of an adjacent position associated with the image block to be processed in the embodiment of the present application;
- FIG. 16 is an exemplary schematic diagram of an acquisition sequence from right to left in an embodiment of the present application.
- FIG. 17 is an exemplary schematic diagram of an acquisition sequence from top to bottom in an embodiment of the present application.
- FIG. 18 is an exemplary schematic diagram of an obtaining sequence from the upper right to the lower left in the embodiment of the present application.
- 19 to 30 are exemplary schematic diagrams of different acquisition sequences in the embodiment of the present application.
- FIG. 31 is another exemplary flowchart of a motion information prediction method according to an embodiment of the present application.
- FIG. 32 is a block diagram of an exemplary structure of a motion information prediction apparatus according to an embodiment of the present application.
- FIG. 33 is another exemplary structural block diagram of a motion information prediction apparatus according to an embodiment of the present application.
- FIG. 34 is a schematic structural block diagram of a motion information prediction device in an embodiment of the present application.
- FIG. 1 is a block diagram of a video decoding system 1 according to an example described in the embodiment of the present application.
- video coder generally refers to both video encoders and video decoders.
- video coding or “coding” may generally refer to video encoding or video decoding.
- the video encoder 100 and the video decoder 200 of the video decoding system 1 are configured to predict a current coded image block according to various method examples described in any of a variety of new inter prediction modes proposed in the present application.
- the motion information of the sub-block or its sub-blocks makes the predicted motion vector close to the motion vector obtained using the motion estimation method to the greatest extent, so that the motion vector difference is not transmitted during encoding, thereby further improving the encoding and decoding performance.
- the video decoding system 1 includes a source device 10 and a destination device 20.
- the source device 10 generates encoded video data. Therefore, the source device 10 may be referred to as a video encoding device.
- the destination device 20 may decode the encoded video data generated by the source device 10. Therefore, the destination device 20 may be referred to as a video decoding device.
- Various implementations of the source device 10, the destination device 20, or both may include one or more processors and a memory coupled to the one or more processors.
- the memory may include, but is not limited to, RAM, ROM, EEPROM, flash memory, or any other media that can be used to store the desired program code in the form of instructions or data structures accessible by a computer, as described herein.
- the source device 10 and the destination device 20 may include various devices including desktop computers, mobile computing devices, notebook (e.g., laptop) computers, tablet computers, set-top boxes, telephone handsets, such as so-called “smart” phones, etc. Cameras, televisions, cameras, display devices, digital media players, video game consoles, on-board computers, or the like.
- the destination device 20 may receive the encoded video data from the source device 10 via the link 30.
- the link 30 may include one or more media or devices capable of moving the encoded video data from the source device 10 to the destination device 20.
- the link 30 may include one or more communication media enabling the source device 10 to directly transmit the encoded video data to the destination device 20 in real time.
- the source device 10 may modulate the encoded video data according to a communication standard, such as a wireless communication protocol, and may transmit the modulated video data to the destination device 20.
- the one or more communication media may include wireless and / or wired communication media, such as a radio frequency (RF) spectrum or one or more physical transmission lines.
- RF radio frequency
- the one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (eg, the Internet).
- the one or more communication media may include a router, a switch, a base station, or other devices that facilitate communication from the source device 10 to the destination device 20.
- the encoded data may be output from the output interface 140 to the storage device 40.
- the encoded data can be accessed from the storage device 40 through the input interface 240.
- the storage device 40 may include any of a variety of distributed or locally-accessed data storage media, such as a hard drive, Blu-ray disc, DVD, CD-ROM, flash memory, volatile or non-volatile memory, Or any other suitable digital storage medium for storing encoded video data.
- the storage device 40 may correspond to a file server or another intermediate storage device that may hold the encoded video produced by the source device 10.
- the destination device 20 may access the stored video data from the storage device 40 via streaming or download.
- the file server may be any type of server capable of storing encoded video data and transmitting the encoded video data to the destination device 20.
- Example file servers include a web server (eg, for a website), an FTP server, a network attached storage (NAS) device, or a local disk drive.
- the destination device 20 can access the encoded video data through any standard data connection, including an Internet connection.
- This may include a wireless channel (e.g., Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both suitable for accessing encoded video data stored on a file server.
- the transmission of the encoded video data from the storage device 40 may be a streaming transmission, a download transmission, or a combination of the two.
- the motion vector prediction technology of the present application can be applied to video codecs to support a variety of multimedia applications, such as over-the-air television broadcasting, cable television transmission, satellite television transmission, streaming video transmission (e.g., via the Internet), for storage in data storage Encoding of video data on media, decoding of video data stored on data storage media, or other applications.
- the video coding system 1 may be used to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and / or video telephony.
- the video decoding system 1 illustrated in FIG. 1 is merely an example, and the techniques of the present application can be applied to a video decoding setting (for example, video encoding or video decoding) that does not necessarily include any data communication between the encoding device and the decoding device. .
- data is retrieved from local storage, streamed over a network, and so on.
- the video encoding device may encode the data and store the data to a memory, and / or the video decoding device may retrieve the data from the memory and decode the data.
- encoding and decoding are performed by devices that do not communicate with each other, but only encode data to and / or retrieve data from memory and decode data.
- the source device 10 includes a video source 120, a video encoder 100, and an output interface 140.
- the output interface 140 may include a regulator / demodulator (modem) and / or a transmitter.
- Video source 120 may include a video capture device (e.g., a video camera), a video archive containing previously captured video data, a video feed interface to receive video data from a video content provider, and / or a computer for generating video data Graphics systems, or a combination of these sources of video data.
- the video encoder 100 may encode video data from the video source 120.
- the source device 10 transmits the encoded video data directly to the destination device 20 via the output interface 140.
- the encoded video data may also be stored on the storage device 40 for later access by the destination device 20 for decoding and / or playback.
- the destination device 20 includes an input interface 240, a video decoder 200, and a display device 220.
- the input interface 240 includes a receiver and / or a modem.
- the input interface 240 may receive the encoded video data via the link 30 and / or from the storage device 40.
- the display device 220 may be integrated with the destination device 20 or may be external to the destination device 20. Generally, the display device 220 displays decoded video data.
- the display device 220 may include various display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or other types of display devices.
- LCD liquid crystal display
- OLED organic light emitting diode
- video encoder 100 and video decoder 200 may each be integrated with an audio encoder and decoder, and may include an appropriate multiplexer-demultiplexer unit Or other hardware and software to handle encoding of both audio and video in a common or separate data stream.
- the MUX-DEMUX unit may conform to the ITU H.223 multiplexer protocol, or other protocols such as the User Datagram Protocol (UDP), if applicable.
- UDP User Datagram Protocol
- Video encoder 100 and video decoder 200 may each be implemented as any of a variety of circuits such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), Field Programmable Gate Array (FPGA), discrete logic, hardware, or any combination thereof. If the present application is implemented partially in software, the device may store instructions for the software in a suitable non-volatile computer-readable storage medium and may use one or more processors to execute the instructions in hardware Thus implementing the technology of the present application. Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) may be considered as one or more processors. Each of video encoder 100 and video decoder 200 may be included in one or more encoders or decoders, any of which may be integrated as a combined encoder in a corresponding device / Decoder (codec).
- codec device / Decoder
- This application may generally refer to video encoder 100 as “signaling” or “transmitting” certain information to another device, such as video decoder 200.
- the terms “signaling” or “transmitting” may generally refer to the transmission of syntax elements and / or other data to decode the compressed video data. This transfer can occur in real time or almost real time. Alternatively, this communication may occur over a period of time, such as when a syntax element is stored in a coded stream to a computer-readable storage medium at the time of encoding, and the decoding device may then store the syntax element after the syntax element is stored on this medium. retrieve the syntax element at any time.
- H.265 HEVC
- HM HEVC test model
- the latest standard document of H.265 can be obtained from http://www.itu.int/rec/T-REC-H.265.
- the latest version of the standard document is H.265 (12/16).
- the standard document is in full text.
- the citation is incorporated herein.
- HM assumes that video decoding devices have several additional capabilities over existing algorithms of ITU-TH.264 / AVC. For example, H.264 provides 9 intra-prediction encoding modes, while HM provides up to 35 intra-prediction encoding modes.
- H.266 test model The evolution model of the video decoding device.
- the algorithm description of H.266 can be obtained from http://phenix.int-evry.fr/jvet. The latest algorithm description is included in JVET-F1001-v2.
- the algorithm description document is incorporated herein by reference in its entirety.
- reference software for the JEM test model can be obtained from https://jvet.hhi.fraunhofer.de/svn/svn_HMJEMSoftware/, which is also incorporated herein by reference in its entirety.
- HM can divide a video frame or image into a sequence of tree blocks or maximum coding units (LCUs) containing both luminance and chrominance samples.
- LCUs are also known as CTUs.
- the tree block has a similar purpose as the macro block of the H.264 standard.
- a slice contains several consecutive tree blocks in decoding order.
- a video frame or image can be split into one or more slices.
- Each tree block can be split into coding units according to a quadtree. For example, a tree block that is a root node of a quad tree may be split into four child nodes, and each child node may be a parent node and split into another four child nodes.
- the final indivisible child nodes that are leaf nodes of the quadtree include decoding nodes, such as decoded video blocks.
- decoding nodes such as decoded video blocks.
- the syntax data associated with the decoded codestream can define the maximum number of times a tree block can be split, and can also define the minimum size of a decoding node.
- the coding unit includes a decoding node, a prediction unit (PU), and a transformation unit (TU) associated with the decoding node.
- the size of the CU corresponds to the size of the decoding node and the shape must be square.
- the size of the CU can range from 8 ⁇ 8 pixels to a maximum 64 ⁇ 64 pixels or larger tree block size.
- Each CU may contain one or more PUs and one or more TUs.
- the syntax data associated with a CU may describe a case where a CU is partitioned into one or more PUs.
- the partitioning mode may be different between cases where the CU is skipped or is encoded in direct mode, intra prediction mode, or inter prediction mode.
- the PU can be divided into non-square shapes.
- the syntax data associated with a CU may also describe a case where a CU is partitioned into one or more TUs according to a quadtree.
- the shape of the TU can be square or non-square.
- the HEVC standard allows transformation based on the TU, which can be different for different CUs.
- the TU is usually sized based on the size of the PUs within a given CU defined for the partitioned LCU, but this may not always be the case.
- the size of the TU is usually the same as or smaller than the PU.
- a quad-tree structure called "residual quad-tree" (RQT) can be used to subdivide the residual samples corresponding to the CU into smaller units.
- the leaf node of RQT may be called TU.
- the pixel difference values associated with the TU may be transformed to produce a transformation coefficient, which may be quantized.
- the PU contains data related to the prediction process.
- the PU may include data describing the intra-prediction mode of the PU.
- the PU may include data defining a motion vector of the PU.
- the data defining the motion vector of the PU may describe the horizontal component of the motion vector, the vertical component of the motion vector, the resolution of the motion vector (e.g., quarter-pixel accuracy or eighth-pixel accuracy), motion vector The reference image pointed to, and / or the reference image list of the motion vector (eg, list 0, list 1 or list C).
- TU uses transform and quantization processes.
- a given CU with one or more PUs may also contain one or more TUs.
- video encoder 100 may calculate a residual value corresponding to the PU.
- the residual values include pixel differences that can be transformed into transform coefficients, quantized, and scanned using TU to generate serialized transform coefficients for entropy decoding.
- This application generally uses the term "video block" to refer to the decoding node of a CU.
- the term “video block” may also be used in this application to refer to a tree block including a decoding node and a PU and a TU, such as an LCU or a CU.
- a video sequence usually contains a series of video frames or images.
- a group of pictures exemplarily includes a series, one or more video pictures.
- the GOP may include syntax data in the header information of the GOP, the header information of one or more of the pictures, or elsewhere, and the syntax data describes the number of pictures included in the GOP.
- Each slice of the image may contain slice syntax data describing the coding mode of the corresponding image.
- Video encoder 100 typically operates on video blocks within individual video slices to encode video data.
- a video block may correspond to a decoding node within a CU.
- Video blocks may have fixed or varying sizes, and may differ in size according to a specified decoding standard.
- HM supports prediction of various PU sizes. Assuming the size of a specific CU is 2N ⁇ 2N, HM supports intra prediction of PU sizes of 2N ⁇ 2N or N ⁇ N, and symmetric PU sizes of 2N ⁇ 2N, 2N ⁇ N, N ⁇ 2N or N ⁇ N prediction. HM also supports asymmetric partitioning of PU-sized inter predictions of 2N ⁇ nU, 2N ⁇ nD, nL ⁇ 2N, and nR ⁇ 2N. In asymmetric partitioning, one direction of the CU is not partitioned, and the other direction is partitioned into 25% and 75%.
- 2N ⁇ nU refers to a horizontally-divided 2N ⁇ 2NCU, where 2N ⁇ 0.5NPU is at the top and 2N ⁇ 1.5NPU is at the bottom.
- N ⁇ N and “N times N” are used interchangeably to refer to the pixel size of a video block according to vertical and horizontal dimensions, for example, 16 ⁇ 16 pixels or 16 ⁇ 16 pixels.
- an N ⁇ N block has N pixels in the vertical direction and N pixels in the horizontal direction, where N represents a non-negative integer value.
- Pixels in a block can be arranged in rows and columns.
- the block does not necessarily need to have the same number of pixels in the horizontal direction as in the vertical direction.
- a block may include N ⁇ M pixels, where M is not necessarily equal to N.
- the video encoder 100 may calculate the residual data of the TU of the CU.
- a PU may include pixel data in a spatial domain (also referred to as a pixel domain), and a TU may include transforming (e.g., discrete cosine transform (DCT), integer transform, wavelet transform, or conceptually similar transform) Coefficients in the transform domain after being applied to the residual video data.
- the residual data may correspond to a pixel difference between a pixel of an uncoded image and a prediction value corresponding to a PU.
- the video encoder 100 may form a TU including residual data of a CU, and then transform the TU to generate a transform coefficient of the CU.
- video encoder 100 may perform quantization of the transform coefficients.
- Quantization exemplarily refers to the process of quantizing coefficients to possibly reduce the amount of data used to represent the coefficients to provide further compression.
- the quantization process may reduce the bit depth associated with some or all of the coefficients. For example, n-bit values may be rounded down to m-bit values during quantization, where n is greater than m.
- the JEM model further improves the coding structure of video images.
- a block coding structure called "Quad Tree Combined with Binary Tree” (QTBT) is introduced.
- QTBT Quality Tree Combined with Binary Tree
- a CU can be square or rectangular.
- a CTU first performs a quadtree partition, and the leaf nodes of the quadtree further perform a binary tree partition.
- there are two partitioning modes in binary tree partitioning symmetrical horizontal partitioning and symmetrical vertical partitioning.
- the leaf nodes of a binary tree are called CUs.
- JEM's CUs cannot be further divided during the prediction and transformation process, which means that JEM's CU, PU, and TU have the same block size.
- the maximum size of the CTU is 256 ⁇ 256 luminance pixels.
- the video encoder 100 may utilize a predefined scan order to scan the quantized transform coefficients to generate a serialized vector that can be entropy encoded.
- the video encoder 100 may perform adaptive scanning. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 100 may perform context-adaptive variable length decoding (CAVLC), context-adaptive binary arithmetic decoding (CABAC), syntax-based context-adaptive binary Arithmetic decoding (SBAC), probability interval partition entropy (PIPE) decoding, or other entropy decoding methods to entropy decode a one-dimensional vector.
- Video encoder 100 may also entropy encode syntax elements associated with the encoded video data for use by video decoder 200 to decode the video data.
- video encoder 100 may assign a context within a context model to a symbol to be transmitted. Context can be related to whether adjacent values of a symbol are non-zero.
- Context can be related to whether adjacent values of a symbol are non-zero.
- the video encoder 100 may select a variable length code of a symbol to be transmitted. Codewords in Variable Length Decoding (VLC) may be constructed such that relatively short codes correspond to more likely symbols and longer codes correspond to less likely symbols. In this way, the use of VLC can achieve the goal of saving code rates relative to using equal length codewords for each symbol to be transmitted.
- the probability in CABAC can be determined based on the context assigned to the symbol.
- the video encoder may perform inter prediction to reduce temporal redundancy between images.
- a CU may have one or more prediction units PU according to the provisions of different video compression codec standards.
- multiple PUs may belong to a CU, or PUs and CUs are the same size.
- the CU's partitioning mode is not divided, or it is divided into one PU, and the PU is uniformly used for expression.
- the video encoder may signal the video decoder motion information for the PU.
- the motion information of the PU may include: a reference image index, a motion vector, and a prediction direction identifier.
- a motion vector may indicate a displacement between an image block (also called a video block, a pixel block, a pixel set, etc.) of a PU and a reference block of the PU.
- the reference block of the PU may be a part of the reference picture similar to the image block of the PU.
- the reference block may be located in a reference image indicated by a reference image index and a prediction direction identifier.
- the video encoder may generate candidate prediction motion vectors (Motion Vector, MV) for each of the PUs according to the merge prediction mode or advanced motion vector prediction mode process. List.
- Each candidate prediction motion vector in the candidate prediction motion vector list for the PU may indicate motion information.
- the motion information indicated by some candidate prediction motion vectors in the candidate prediction motion vector list may be based on the motion information of other PUs. If the candidate prediction motion vector indicates motion information specifying one of a spatial candidate prediction motion vector position or a temporal candidate prediction motion vector position, the present application may refer to the candidate prediction motion vector as an "original" candidate prediction motion vector.
- a merge mode also referred to herein as a merge prediction mode
- the video encoder may generate additional candidate prediction motion vectors by combining partial motion vectors from different original candidate prediction motion vectors, modifying the original candidate prediction motion vectors, or inserting only zero motion vectors as candidate prediction motion vectors. These additional candidate prediction motion vectors are not considered as original candidate prediction motion vectors and may be referred to as artificially generated candidate prediction motion vectors in this application.
- the techniques of this application generally relate to a technique for generating a list of candidate prediction motion vectors at a video encoder and a technique for generating the same list of candidate prediction motion vectors at a video decoder.
- the video encoder and video decoder may generate the same candidate prediction motion vector list by implementing the same techniques used to construct the candidate prediction motion vector list. For example, both a video encoder and a video decoder may build a list with the same number of candidate prediction motion vectors (eg, five candidate prediction motion vectors).
- Video encoders and decoders may first consider spatial candidate prediction motion vectors (e.g., neighboring blocks in the same image), then consider temporal candidate prediction motion vectors (e.g., candidate prediction motion vectors in different images), and finally consider The artificially generated candidate prediction motion vectors are added until a desired number of candidate prediction motion vectors are added to the list.
- a pruning operation may be used for certain types of candidate prediction motion vectors during the construction of the candidate prediction motion vector list to remove duplicates from the candidate prediction motion vector list, while for other types of candidate prediction motion vectors, it may Use pruning to reduce decoder complexity.
- a pruning operation may be performed to exclude candidate prediction motion vectors with duplicate motion information from the list of candidate prediction motion vectors.
- artificially generated candidate predicted motion vectors may be added without performing a trimming operation on the artificially generated candidate predicted motion vectors.
- the video encoder may select the candidate prediction motion vector from the candidate prediction motion vector list and output the candidate prediction motion vector index in the code stream.
- the selected candidate prediction motion vector may be a candidate prediction motion vector having a motion vector that most closely matches the predictor of the target PU being decoded.
- the candidate prediction motion vector index may indicate a position where a candidate prediction motion vector is selected in the candidate prediction motion vector list.
- the video encoder may also generate a predictive image block for the PU based on a reference block indicated by the motion information of the PU. The motion information of the PU may be determined based on the motion information indicated by the selected candidate prediction motion vector.
- the motion information of the PU may be the same as the motion information indicated by the selected candidate prediction motion vector.
- the motion information of the PU may be determined based on the motion vector difference of the PU and the motion information indicated by the selected candidate prediction motion vector.
- the video encoder may generate one or more residual image blocks for the CU based on the predictive image blocks of the PU of the CU and the original image blocks for the CU. The video encoder may then encode one or more residual image blocks and output one or more residual image blocks in a code stream.
- the codestream may include data identifying a selected candidate prediction motion vector in the candidate prediction motion vector list of the PU.
- the video decoder may determine the motion information of the PU based on the motion information indicated by the selected candidate prediction motion vector in the candidate prediction motion vector list of the PU.
- the video decoder may identify one or more reference blocks for the PU based on the motion information of the PU. After identifying one or more reference blocks of the PU, the video decoder may generate predictive image blocks for the PU based on the one or more reference blocks of the PU.
- the video decoder may reconstruct an image block for a CU based on a predictive image block for a PU of the CU and one or more residual image blocks for the CU.
- the present application may describe a position or an image block as having various spatial relationships with a CU or a PU. This description can be interpreted to mean that the position or image block and the image block associated with the CU or PU have various spatial relationships.
- a PU currently being decoded by a video decoder may be referred to as a current PU, and may also be referred to as a current image block to be processed.
- This application may refer to the CU that the video decoder is currently decoding as the current CU.
- This application may refer to the image currently being decoded by the video decoder as the current image. It should be understood that this application is applicable to a case where the PU and the CU have the same size, or the PU is the CU, and the PU is used to represent the same.
- video encoder 100 may use inter prediction to generate predictive image blocks and motion information for a PU of a CU.
- the motion information of a given PU may be the same or similar to the motion information of one or more nearby PUs (ie, PUs whose image blocks are spatially or temporally near the image blocks of the given PU). Because nearby PUs often have similar motion information, video encoder 100 may refer to the motion information of nearby PUs to encode motion information for a given PU. Encoding the motion information of a given PU with reference to the motion information of nearby PUs can reduce the number of encoding bits required to indicate the motion information of a given PU in the code stream.
- Video encoder 100 may refer to motion information of nearby PUs in various ways to encode motion information for a given PU.
- video encoder 100 may indicate that the motion information of a given PU is the same as the motion information of nearby PUs.
- This application may use a merge mode to refer to indicating that the motion information of a given PU is the same as that of nearby PUs or may be derived from the motion information of nearby PUs.
- the video encoder 100 may calculate a Motion Vector Difference (MVD) for a given PU.
- MVD Motion Vector Difference
- MVD indicates the difference between the motion vector of a given PU and the motion vector of a nearby PU.
- Video encoder 100 may include MVD instead of a motion vector of a given PU in the motion information of a given PU. Representing MVD in the codestream requires fewer coding bits than representing the motion vector of a given PU.
- This application may use advanced motion vector prediction mode to refer to the motion information of a given PU by using the MVD and an index value identifying a candidate motion vector.
- the video encoder 100 may generate a list of candidate predicted motion vectors for a given PU.
- the candidate prediction motion vector list may include one or more candidate prediction motion vectors.
- Each of the candidate prediction motion vectors in the candidate prediction motion vector list for a given PU may specify motion information.
- the motion information indicated by each candidate prediction motion vector may include a motion vector, a reference image index, and a prediction direction identifier.
- the candidate prediction motion vectors in the candidate prediction motion vector list may include "raw" candidate prediction motion vectors, each of which indicates motion information that is different from one of the specified candidate prediction motion vector positions within a PU of a given PU.
- the video encoder 100 may select one of the candidate prediction motion vectors from the candidate prediction motion vector list for the PU. For example, a video encoder may compare each candidate prediction motion vector with the PU being decoded and may select a candidate prediction motion vector with a desired code rate-distortion cost. Video encoder 100 may output a candidate prediction motion vector index for a PU. The candidate prediction motion vector index may identify the position of the selected candidate prediction motion vector in the candidate prediction motion vector list.
- the video encoder 100 may generate a predictive image block for a PU based on a reference block indicated by motion information of the PU.
- the motion information of the PU may be determined based on the motion information indicated by the selected candidate prediction motion vector in the candidate prediction motion vector list for the PU.
- the motion information of the PU may be the same as the motion information indicated by the selected candidate prediction motion vector.
- motion information of a PU may be determined based on a motion vector difference for the PU and motion information indicated by a selected candidate prediction motion vector.
- Video encoder 100 may process predictive image blocks for a PU as described previously.
- video decoder 200 may generate a list of candidate predicted motion vectors for each of the PUs of the CU.
- the candidate prediction motion vector list generated by the video decoder 200 for the PU may be the same as the candidate prediction motion vector list generated by the video encoder 100 for the PU.
- the syntax element parsed from the bitstream may indicate the position of the candidate prediction motion vector selected in the candidate prediction motion vector list of the PU.
- the video decoder 200 may generate predictive image blocks for the PU based on one or more reference blocks indicated by the motion information of the PU.
- Video decoder 200 may determine motion information of the PU based on the motion information indicated by the selected candidate prediction motion vector in the candidate prediction motion vector list for the PU. Video decoder 200 may reconstruct an image block for a CU based on a predictive image block for a PU and a residual image block for a CU.
- the construction of the candidate prediction motion vector list and the parsing of the selected candidate prediction motion vector from the code stream in the candidate prediction motion vector list are independent of each other, and can be arbitrarily Sequentially or in parallel.
- the position of the selected candidate prediction motion vector in the candidate prediction motion vector list is first parsed from the code stream, and a candidate prediction motion vector list is constructed based on the parsed position.
- a candidate prediction motion vector list is constructed based on the parsed position.
- the selected candidate predictive motion vector is obtained by parsing the bitstream and is a candidate predictive motion vector with an index of 3 in the candidate predictive motion vector list, only the candidate predictive motion vector from index 0 to index 3 needs to be constructed
- the list can determine the candidate predicted motion vector with the index of 3, which can achieve the technical effect of reducing complexity and improving decoding efficiency.
- FIG. 2 is a block diagram of a video encoder 100 according to an example described in the embodiment of the present application.
- the video encoder 100 is configured to output a video to the post-processing entity 41.
- the post-processing entity 41 represents an example of a video entity that can process the encoded video data from the video encoder 100, such as a media-aware network element (MANE) or a stitching / editing device.
- the post-processing entity 41 may be an instance of a network entity.
- the post-processing entity 41 and the video encoder 100 may be parts of separate devices, while in other cases, the functionality described with respect to the post-processing entity 41 may be performed by the same device including the video encoder 100 carried out.
- the post-processing entity 41 is an example of the storage device 40 of FIG. 1.
- the video encoder 100 includes a prediction processing unit 108, a filter unit 106, a decoded image buffer (DPB) 107, a summer 112, a transformer 101, a quantizer 102, and an entropy encoder 103.
- the prediction processing unit 108 includes an inter predictor 110 and an intra predictor 109.
- the video encoder 100 further includes an inverse quantizer 104, an inverse transformer 105, and a summer 111.
- the filter unit 106 is intended to represent one or more loop filters, such as a deblocking filter, an adaptive loop filter (ALF), and a sample adaptive offset (SAO) filter.
- the filter unit 106 is shown as an in-loop filter in FIG. 2A, in other implementations, the filter unit 106 may be implemented as a post-loop filter.
- the video encoder 100 may further include a video data memory and a segmentation unit (not shown in the figure).
- the video data memory may store video data to be encoded by the components of the video encoder 100.
- the video data stored in the video data storage may be obtained from the video source 120.
- the DPB 107 may be a reference image memory that stores reference video data used by the video encoder 100 to encode video data in an intra-frame or inter-frame decoding mode.
- Video data memory and DPB 107 can be formed by any of a variety of memory devices, such as dynamic random access memory (DRAM), synchronous resistive RAM (MRAM), resistive RAM (RRAM) including synchronous DRAM (SDRAM), Or other types of memory devices.
- Video data storage and DPB 107 can be provided by the same storage device or separate storage devices.
- the video data memory may be on-chip with other components of video encoder 100 or off-chip relative to those components.
- the video encoder 100 receives video data and stores the video data in a video data memory.
- the segmentation unit divides the video data into several image blocks, and these image blocks can be further divided into smaller blocks, such as image block segmentation based on a quad tree structure or a binary tree structure. This segmentation may also include segmentation into slices, tiles, or other larger units.
- Video encoder 100 typically illustrates components that encode image blocks within a video slice to be encoded.
- the slice can be divided into multiple image patches (and possibly into a collection of image patches called slices).
- the prediction processing unit 108 may select one of a plurality of possible coding modes for the current image block, such as one of a plurality of intra coding modes or one of a plurality of inter coding modes.
- the prediction processing unit 108 may provide the obtained intra, inter-coded block to the summer 112 to generate a residual block, and to the summer 111 to reconstruct an encoded block used as a reference image.
- the intra predictor 109 within the prediction processing unit 108 may perform intra predictive encoding of the current image block with respect to one or more neighboring blocks in the same frame or slice as the current block to be encoded to remove spatial redundancy.
- the inter predictor 110 within the prediction processing unit 108 may perform inter predictive coding of the current image block with respect to one or more prediction blocks in the one or more reference images to remove temporal redundancy.
- the inter predictor 110 may be configured to determine an inter prediction mode for encoding a current image block. For example, the inter predictor 110 may use a rate-distortion analysis to calculate the rate-distortion values of various inter-prediction modes in the set of candidate inter-prediction modes, and select from them the best rate-distortion characteristics. Inter prediction mode. Code rate distortion analysis generally determines the amount of distortion (or error) between the coded block and the original uncoded block that was coded to produce the coded block, and the bit rate (also That is, the number of bits). For example, the inter predictor 110 may determine that the inter prediction mode with the lowest code rate distortion cost of encoding the current image block in the candidate inter prediction mode set is the inter prediction mode used for inter prediction of the current image block.
- Code rate distortion analysis generally determines the amount of distortion (or error) between the coded block and the original uncoded block that was coded to produce the coded block, and the bit rate (also That is, the number of bits).
- the inter predictor 110
- the inter predictor 110 is configured to predict motion information (for example, a motion vector) of one or more subblocks in the current image block based on the determined inter prediction mode, and use the motion information (for example, Motion vector) to obtain or generate a prediction block of the current image block.
- the inter predictor 110 may locate a prediction block pointed to by the motion vector in one of the reference image lists.
- the inter predictor 110 may also generate syntax elements associated with image blocks and video slices for use by the video decoder 200 when decoding image blocks of the video slice.
- the inter predictor 110 uses the motion information of each sub-block to perform a motion compensation process to generate a prediction block of each sub-block, thereby obtaining a prediction block of the current image block;
- the inter predictor 110 performs motion estimation and motion compensation processes.
- the inter predictor 110 may provide information indicating the selected inter prediction mode of the current image block to the entropy encoder 103 so that the entropy encoder 103 encodes the instruction. Information on the selected inter prediction mode.
- the intra predictor 109 may perform intra prediction on the current image block.
- the intra predictor 109 may determine an intra prediction mode used to encode the current block.
- the intra predictor 109 may use a rate-distortion analysis to calculate the rate-distortion values of various intra-prediction modes to be tested, and select the one with the best rate-distortion characteristics from the modes to be tested.
- Intra prediction mode In any case, after the intra prediction mode is selected for the image block, the intra predictor 109 may provide information indicating the selected intra prediction mode of the current image block to the entropy encoder 103 so that the entropy encoder 103 encodes the indication Information on the selected intra prediction mode.
- the video encoder 100 forms a residual image block by subtracting the prediction block from the current image block to be encoded.
- the summer 112 represents one or more components that perform this subtraction operation.
- the residual video data in the residual block may be included in one or more TUs and applied to the transformer 101.
- the transformer 101 transforms the residual video data into residual transform coefficients using a transform such as a discrete cosine transform (DCT) or a conceptually similar transform.
- the transformer 101 may transform the residual video data from a pixel value domain to a transform domain, such as a frequency domain.
- DCT discrete cosine transform
- the transformer 101 may send the obtained transform coefficients to a quantizer 102.
- a quantizer 102 quantizes the transform coefficients to further reduce the bit code rate.
- the quantizer 102 may then perform a scan of a matrix containing the quantized transform coefficients.
- the entropy encoder 103 may perform scanning.
- the entropy encoder 103 After quantization, the entropy encoder 103 entropy encodes the quantized transform coefficients. For example, the entropy encoder 103 can perform context-adaptive variable-length coding (CAVLC), context-adaptive binary arithmetic coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), and probability interval segmentation entropy (PIPE ) Coding or another entropy coding method or technique.
- CAVLC context-adaptive variable-length coding
- CABAC context-adaptive binary arithmetic coding
- SBAC syntax-based context-adaptive binary arithmetic coding
- PIPE probability interval segmentation entropy Coding or another entropy coding method or technique.
- the encoded code stream may be transmitted to the video decoder 200, or archived for later transmission or retrieved by the video decoder 200.
- the entropy encoder 103 may also perform entrop
- the inverse quantizer 104 and the inverse changer 105 respectively apply inverse quantization and inverse transform to reconstruct the residual block in the pixel domain, for example, for later use as a reference block of a reference image.
- the summer 111 adds the reconstructed residual block to a prediction block generated by the inter predictor 110 or the intra predictor 109 to generate a reconstructed image block.
- the filter unit 106 may be adapted to reconstruct image blocks to reduce distortion, such as block artifacts. This reconstructed image block is then stored as a reference block in the decoded image buffer 107 and can be used by the inter predictor 110 as a reference block to perform inter prediction on subsequent video frames or blocks in the image.
- the video encoder 100 may directly quantize the residual signal without processing by the transformer 101 and correspondingly does not need to be processed by the inverse transformer 105; or, for some image blocks Or image frames, the video encoder 100 does not generate residual data, and accordingly does not need to be processed by the transformer 101, quantizer 102, inverse quantizer 104, and inverse transformer 105; or, the video encoder 100 may convert the reconstructed image
- the blocks are stored directly as reference blocks without being processed by the filter unit 106; alternatively, the quantizer 102 and the inverse quantizer 104 in the video encoder 100 may be merged together.
- FIG. 3 is a block diagram of an example video decoder 200 described in the embodiment of the present application.
- the video decoder 200 includes an entropy decoder 203, a prediction processing unit 208, an inverse quantizer 204, an inverse transformer 205, a summer 211, a filter unit 206, and a decoded image buffer 207.
- the prediction processing unit 208 may include an inter predictor 210 and an intra predictor 209.
- video decoder 200 may perform a decoding process that is substantially inverse to the encoding process described with respect to video encoder 100 from FIG. 2.
- the video decoder 200 receives from the video encoder 100 an encoded video codestream representing image blocks of the encoded video slice and associated syntax elements.
- the video decoder 200 may receive video data from the network entity 42, optionally, the video data may also be stored in a video data storage (not shown in the figure).
- the video data memory may store video data, such as an encoded video code stream, to be decoded by components of the video decoder 200.
- the video data stored in the video data storage can be obtained, for example, from the storage device 40, from a local video source such as a camera, via a wired or wireless network of video data, or by accessing a physical data storage medium.
- the video data memory can be used as a decoded image buffer (CPB) for storing encoded video data from the encoded video bitstream. Therefore, although the video data storage is not shown in FIG. 3, the video data storage and the DPB 207 may be the same storage, or may be separately provided storages. Video data memory and DPB 207 can be formed by any of a variety of memory devices, such as: dynamic random access memory (DRAM) including synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), and resistive RAM (RRAM) , Or other types of memory devices. In various examples, the video data memory may be integrated on a chip with other components of the video decoder 200 or provided off-chip relative to those components.
- DRAM dynamic random access memory
- SDRAM synchronous DRAM
- MRAM magnetoresistive RAM
- RRAM resistive RAM
- the video data memory may be integrated on a chip with other components of the video decoder 200 or provided off-chip relative to those components.
- the network entity 42 may be, for example, a server, a MANE, a video editor / splicer, or other such device for implementing one or more of the techniques described above.
- the network entity 42 may or may not include a video encoder, such as video encoder 100.
- the network entity 42 may implement some of the techniques described in this application.
- the network entity 42 and the video decoder 200 may be part of separate devices, while in other cases, the functionality described with respect to the network entity 42 may be performed by the same device including the video decoder 200.
- the network entity 42 may be an example of the storage device 40 of FIG. 1.
- the entropy decoder 203 of the video decoder 200 entropy decodes the code stream to generate quantized coefficients and some syntax elements.
- the entropy decoder 203 forwards the syntax elements to the prediction processing unit 208.
- Video decoder 200 may receive syntax elements at a video slice level and / or an image block level.
- the intra predictor 209 of the prediction processing unit 208 may be based on the signaled intra prediction mode and the previously decoded block from the current frame or image. Data to generate prediction blocks for image blocks of the current video slice.
- the inter predictor 210 of the prediction processing unit 208 may determine, based on the syntax elements received from the entropy decoder 203, the An inter prediction mode in which a current image block of a video slice is decoded, and based on the determined inter prediction mode, the current image block is decoded (for example, inter prediction is performed).
- the inter predictor 210 may determine whether to use the new inter prediction mode to predict the current image block of the current video slice. If the syntax element indicates that the new inter prediction mode is used to predict the current image block, based on A new inter prediction mode (for example, a new inter prediction mode specified by a syntax element or a default new inter prediction mode) predicts the current image block of the current video slice or a sub-block of the current image block. Motion information, so that the motion information of the current image block or a sub-block of the current image block is used to obtain or generate a prediction block of the current image block or a sub-block of the current image block through a motion compensation process.
- a new inter prediction mode for example, a new inter prediction mode specified by a syntax element or a default new inter prediction mode
- the motion information here may include reference image information and motion vectors, where the reference image information may include but is not limited to unidirectional / bidirectional prediction information, a reference image list number, and a reference image index corresponding to the reference image list.
- a prediction block may be generated from one of reference pictures within one of the reference picture lists.
- the video decoder 200 may construct a reference image list, that is, a list 0 and a list 1, based on the reference images stored in the DPB 207.
- the reference frame index of the current image may be included in one or more of the reference frame list 0 and list 1.
- the video encoder 100 may signal whether to use a new inter prediction mode to decode a specific syntax element of a specific block, or may be a signal to indicate whether to use a new inter prediction mode. And indicating which new inter prediction mode is used to decode a specific syntax element of a specific block. It should be understood that the inter predictor 210 here performs a motion compensation process.
- the inverse quantizer 204 inverse quantizes, that is, dequantizes, the quantized transform coefficients provided in the code stream and decoded by the entropy decoder 203.
- the inverse quantization process may include using a quantization parameter calculated by the video encoder 100 for each image block in the video slice to determine the degree of quantization that should be applied and similarly to determine the degree of inverse quantization that should be applied.
- the inverse transformer 205 applies an inverse transform to transform coefficients, such as an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process to generate a residual block in the pixel domain.
- the video decoder 200 works by comparing the residual block from the inverse transformer 205 with the corresponding prediction generated by the inter predictor 210 The blocks are summed to get the reconstructed block, that is, the decoded image block.
- the summer 211 represents a component that performs this summing operation.
- a loop filter in or after the decoding loop
- the filter unit 206 may represent one or more loop filters, such as a deblocking filter, an adaptive loop filter (ALF), and a sample adaptive offset (SAO) filter.
- the filter unit 206 is shown as an in-loop filter in FIG. 2B, in other implementations, the filter unit 206 may be implemented as a post-loop filter.
- the filter unit 206 is adapted to reconstruct a block to reduce block distortion, and the result is output as a decoded video stream.
- a decoded image block in a given frame or image may also be stored in a decoded image buffer 207, and the decoded image buffer 207 stores a reference image for subsequent motion compensation.
- the decoded image buffer 207 may be part of a memory, which may also store the decoded video for later presentation on a display device, such as the display device 220 of FIG. 1, or may be separate from such memory.
- the video decoder 200 may generate an output video stream without being processed by the filter unit 206; or, for certain image blocks or image frames, the entropy decoder 203 of the video decoder 200 does not decode the quantized coefficients, and accordingly, It does not need to be processed by the inverse quantizer 204 and the inverse transformer 205.
- the techniques of this application exemplarily involve inter-frame decoding. It should be understood that the techniques of this application may be performed by any of the video decoders described in this application.
- the video decoder includes, for example, video encoder 100 and video decoding as shown and described with respect to FIGS. ⁇ 200 ⁇ 200. That is, in one feasible implementation, the inter predictor 110 described with respect to FIG. 2 may perform specific techniques described below when performing inter prediction during encoding of a block of video data. In another possible implementation, the inter predictor 210 described with respect to FIG. 3 may perform specific techniques described below when performing inter prediction during decoding of a block of video data.
- a reference to a generic "video encoder" or "video decoder” may include video encoder 100, video decoder 200, or another video encoding or coding unit.
- FIG. 4 is a schematic block diagram of an inter prediction module according to an embodiment of the present application.
- the inter prediction module 121 may include a motion estimation unit 42 and a motion compensation unit 44.
- the relationship between PU and CU is different in different video compression codecs.
- the inter prediction module 121 may partition a current CU into a PU according to a plurality of partitioning modes.
- the inter prediction module 121 may partition a current CU into a PU according to 2N ⁇ 2N, 2N ⁇ N, N ⁇ 2N, and N ⁇ N partition modes.
- the current CU is the current PU, which is not limited.
- the inter prediction module 121 may perform integer motion estimation (IME) and then perform fractional motion estimation (FME) on each of the PUs.
- IME integer motion estimation
- FME fractional motion estimation
- the inter prediction module 121 may search a reference block for a PU in one or more reference images. After the reference block for the PU is found, the inter prediction module 121 may generate a motion vector indicating the spatial displacement between the PU and the reference block for the PU with integer precision.
- the inter prediction module 121 may improve a motion vector generated by performing IME on the PU.
- a motion vector generated by performing FME on a PU may have sub-integer precision (eg, 1/2 pixel precision, 1/4 pixel precision, etc.).
- the inter prediction module 121 may use the motion vector for the PU to generate a predictive image block for the PU.
- the inter prediction module 121 may generate a list of candidate prediction motion vectors for the PU.
- the candidate prediction motion vector list may include one or more original candidate prediction motion vectors and one or more additional candidate prediction motion vectors derived from the original candidate prediction motion vectors.
- the inter prediction module 121 may select the candidate prediction motion vector from the candidate prediction motion vector list and generate a motion vector difference (MVD) for the PU.
- the MVD for a PU may indicate a difference between a motion vector indicated by a selected candidate prediction motion vector and a motion vector generated for the PU using IME and FME.
- the inter prediction module 121 may output a candidate prediction motion vector index that identifies the position of the selected candidate prediction motion vector in the candidate prediction motion vector list.
- the inter prediction module 121 may also output the MVD of the PU.
- a detailed implementation of the advanced motion vector prediction (AMVP) mode in the embodiment of the present application in FIG. 6 is described in detail below.
- the inter prediction module 121 may also perform a merge operation on each of the PUs.
- the inter prediction module 121 may generate a list of candidate prediction motion vectors for the PU.
- the candidate prediction motion vector list for the PU may include one or more original candidate prediction motion vectors and one or more additional candidate prediction motion vectors derived from the original candidate prediction motion vectors.
- the original candidate prediction motion vector in the candidate prediction motion vector list may include one or more spatial candidate prediction motion vectors and temporal candidate prediction motion vectors.
- the spatial candidate prediction motion vector may indicate motion information of other PUs in the current image.
- the temporal candidate prediction motion vector may be based on motion information of a corresponding PU different from the current picture.
- the temporal candidate prediction motion vector may also be referred to as temporal motion vector prediction (TMVP).
- the inter prediction module 121 may select one of the candidate prediction motion vectors from the candidate prediction motion vector list. The inter prediction module 121 may then generate a predictive image block for the PU based on the reference block indicated by the motion information of the PU. In the merge mode, the motion information of the PU may be the same as the motion information indicated by the selected candidate prediction motion vector.
- Figure 5 described below illustrates an exemplary flowchart for Merge.
- the inter prediction module 121 may select a predictive image block generated through the FME operation or a merge operation. Predictive image blocks. In some feasible implementations, the inter prediction module 121 may select a predictive image for a PU based on a code rate-distortion cost analysis of the predictive image block generated by the FME operation and the predictive image block generated by the merge operation. Piece.
- the inter prediction module 121 may select a partitioning mode for the current CU. In some embodiments, the inter prediction module 121 may select a rate-distortion cost analysis for a selected predictive image block of the PU generated by segmenting the current CU according to each of the partitioning modes to select the Split mode.
- the inter prediction module 121 may output a predictive image block associated with a PU belonging to the selected partition mode to the residual generation module 102.
- the inter prediction module 121 may output a syntax element indicating motion information of a PU belonging to the selected partitioning mode to the entropy encoding module 116.
- the inter prediction module 121 includes IME modules 180A to 180N (collectively referred to as “IME module 180”), FME modules 182A to 182N (collectively referred to as “FME module 182”), and merge modules 184A to 184N (collectively referred to as Are “merging module 184"), PU mode decision modules 186A to 186N (collectively referred to as “PU mode decision module 186”) and CU mode decision module 188 (which may also include performing a mode decision process from CTU to CU).
- IME module 180 IME modules 180A to 180N
- FME module 182 FME modules 182A to 182N
- merge modules 184A to 184N collectively referred to as Are “merging module 184"
- PU mode decision modules 186A to 186N collectively referred to as "PU mode decision module 186”
- CU mode decision module 188 which may also include performing a mode decision process from CTU to CU).
- the IME module 180, the FME module 182, and the merge module 184 may perform an IME operation, an FME operation, and a merge operation on a PU of the current CU.
- the inter prediction module 121 is illustrated in the schematic diagram of FIG. 4 as including a separate IME module 180, an FME module 182, and a merging module 184 for each PU of each partitioning mode of the CU. In other feasible implementations, the inter prediction module 121 does not include a separate IME module 180, an FME module 182, and a merge module 184 for each PU of each partitioning mode of the CU.
- the IME module 180A, the FME module 182A, and the merge module 184A may perform IME operations, FME operations, and merge operations on a PU generated by dividing a CU according to a 2N ⁇ 2N split mode.
- the PU mode decision module 186A may select one of the predictive image blocks generated by the IME module 180A, the FME module 182A, and the merge module 184A.
- the IME module 180B, the FME module 182B, and the merge module 184B may perform an IME operation, an FME operation, and a merge operation on a left PU generated by dividing a CU according to an N ⁇ 2N division mode.
- the PU mode decision module 186B may select one of the predictive image blocks generated by the IME module 180B, the FME module 182B, and the merge module 184B.
- the IME module 180C, the FME module 182C, and the merge module 184C may perform an IME operation, an FME operation, and a merge operation on a right PU generated by dividing a CU according to an N ⁇ 2N division mode.
- the PU mode decision module 186C may select one of the predictive image blocks generated by the IME module 180C, the FME module 182C, and the merge module 184C.
- the IME module 180N, the FME module 182N, and the merge module 184 may perform an IME operation, an FME operation, and a merge operation on a lower right PU generated by dividing a CU according to an N ⁇ N division mode.
- the PU mode decision module 186N may select one of the predictive image blocks generated by the IME module 180N, the FME module 182N, and the merge module 184N.
- the PU mode decision module 186 may select a predictive image block based on a code rate-distortion cost analysis of a plurality of possible predictive image blocks, and select a predictive image block that provides the best code rate-distortion cost for a given decoding situation. For example, for bandwidth-constrained applications, the PU mode decision module 186 may prefer to select predictive image blocks that increase the compression ratio, while for other applications, the PU mode decision module 186 may prefer to select predictive images that increase the quality of the reconstructed video. Piece.
- the CU mode decision module 188 selects a partition mode for the current CU and outputs the predictive image block and motion information of the PU belonging to the selected partition mode. .
- FIG. 5 is an exemplary flowchart of a merge mode in an embodiment of the present application.
- a video encoder eg, video encoder 100
- the video encoder may perform a merge operation 200.
- the video encoder may perform a merge operation different from the merge operation 200.
- the video encoder may perform a merge operation, where the video encoder performs more or fewer steps than the merge operation 200 or steps different from the merge operation 200.
- the video encoder may perform the steps of the merge operation 200 in a different order or in parallel.
- the encoder may also perform a merge operation 200 on a PU encoded in a skip mode.
- the video encoder may generate a list of candidate predicted motion vectors for the current PU (202).
- the video encoder may generate a list of candidate prediction motion vectors for the current PU in various ways. For example, the video encoder may generate a list of candidate prediction motion vectors for the current PU according to one of the example techniques described below with respect to FIGS. 8-12.
- the candidate prediction motion vector list for the current PU may include a temporal candidate prediction motion vector.
- the temporal candidate prediction motion vector may indicate motion information of a co-located PU in the time domain.
- a co-located PU may be spatially in the same position in the image frame as the current PU, but in a reference picture instead of the current picture.
- a reference picture including a PU corresponding to the time domain may be referred to as a related reference picture.
- a reference image index of a related reference image may be referred to as a related reference image index in this application.
- the current image may be associated with one or more reference image lists (eg, list 0, list 1, etc.).
- the reference image index may indicate a reference image by indicating a position in a reference image list of the reference image.
- the current image may be associated with a combined reference image list.
- the related reference picture index is the reference picture index of the PU covering the reference index source position associated with the current PU.
- the reference index source location associated with the current PU is adjacent to the left of the current PU or above the current PU.
- the PU may "cover" the specific location.
- the video encoder can use a zero reference image index.
- the reference index source location associated with the current PU is within the current CU.
- the PU may need to access motion information of another PU of the current CU in order to determine a reference picture containing a co-located PU. Therefore, these video encoders may use motion information (ie, a reference picture index) of a PU belonging to the current CU to generate a temporal candidate prediction motion vector for the current PU. In other words, these video encoders may use temporal information of a PU belonging to the current CU to generate a temporal candidate prediction motion vector. Therefore, the video encoder may not be able to generate a list of candidate prediction motion vectors for the current PU and the PU covering the reference index source position associated with the current PU in parallel.
- motion information ie, a reference picture index
- the video encoder may explicitly set the relevant reference picture index without referring to the reference picture index of any other PU. This may enable the video encoder to generate candidate prediction motion vector lists for the current PU and other PUs of the current CU in parallel. Because the video encoder explicitly sets the relevant reference picture index, the relevant reference picture index is not based on the motion information of any other PU of the current CU. In some feasible implementations where the video encoder explicitly sets the relevant reference picture index, the video encoder may always set the relevant reference picture index to a fixed, predefined preset reference picture index (eg, 0).
- a fixed, predefined preset reference picture index eg, 0
- the video encoder may generate a temporal candidate prediction motion vector based on the motion information of the co-located PU in the reference frame indicated by the preset reference picture index, and may include the temporal candidate prediction motion vector in the candidate prediction of the current CU List of motion vectors.
- the video encoder may be explicitly used in a syntax structure (e.g., image header, slice header, APS, or another syntax structure)
- the related reference picture index is signaled.
- the video encoder may signal the decoder to the relevant reference picture index for each LCU (ie, CTU), CU, PU, TU, or other type of sub-block. For example, the video encoder may signal that the relevant reference picture index for each PU of the CU is equal to "1".
- the relevant reference image index may be set implicitly rather than explicitly.
- the video encoder may use the motion information of the PU in the reference image indicated by the reference image index of the PU covering the location outside the current CU to generate a candidate prediction motion vector list for the PU of the current CU. Each time candidate predicts a motion vector, even if these locations are not strictly adjacent to the current PU.
- the video encoder may generate predictive image blocks associated with the candidate prediction motion vectors in the candidate prediction motion vector list (204).
- the video encoder may generate the candidate prediction motion vector by determining the motion information of the current PU based on the motion information of the indicated candidate prediction motion vector and then generating a predictive image block based on one or more reference blocks indicated by the motion information of the current PU. Associated predictive image blocks.
- the video encoder may then select one of the candidate prediction motion vectors from the candidate prediction motion vector list (206).
- the video encoder can select candidate prediction motion vectors in various ways. For example, a video encoder may select one of the candidate prediction motion vectors based on a code rate-distortion cost analysis of each of the predictive image blocks associated with the candidate prediction motion vector.
- the video encoder may output a candidate prediction motion vector index (208).
- the candidate prediction motion vector index may indicate a position where a candidate prediction motion vector is selected in the candidate prediction motion vector list.
- the candidate prediction motion vector index may be represented as "merge_idx".
- FIG. 6 is an exemplary flowchart of an advanced motion vector prediction (AMVP) mode in an embodiment of the present application.
- a video encoder eg, video encoder 100
- the video encoder may generate one or more motion vectors for the current PU (211).
- the video encoder may perform integer motion estimation and fractional motion estimation to generate motion vectors for the current PU.
- the current image may be associated with two reference image lists (List 0 and List 1).
- the video encoder may generate a list 0 motion vector or a list 1 motion vector for the current PU.
- the list 0 motion vector may indicate a spatial displacement between an image block of the current PU and a reference block in a reference image in list 0.
- the list 1 motion vector may indicate a spatial displacement between an image block of the current PU and a reference block in a reference image in list 1.
- the video encoder may generate a list 0 motion vector and a list 1 motion vector for the current PU.
- the video encoder may generate predictive image blocks for the current PU (212).
- the video encoder may generate predictive image blocks for the current PU based on one or more reference blocks indicated by one or more motion vectors for the current PU.
- the video encoder may generate a list of candidate predicted motion vectors for the current PU (213).
- the video decoder may generate a list of candidate prediction motion vectors for the current PU in various ways.
- the video encoder may generate a list of candidate prediction motion vectors for the current PU according to one or more of the possible implementations described below with respect to FIGS. 8 to 12.
- the list of candidate prediction motion vectors may be limited to two candidate prediction motion vectors.
- the list of candidate prediction motion vectors may include more candidate prediction motion vectors (eg, five candidate prediction motion vectors).
- the video encoder may generate one or more motion vector differences (MVD) for each candidate prediction motion vector in the list of candidate prediction motion vectors (214).
- the video encoder may generate a motion vector difference for the candidate prediction motion vector by determining a difference between the motion vector indicated by the candidate prediction motion vector and a corresponding motion vector of the current PU.
- the video encoder may generate a single MVD for each candidate prediction motion vector. If the current PU is bi-predicted, the video encoder may generate two MVDs for each candidate prediction motion vector.
- the first MVD may indicate a difference between the motion vector of the candidate prediction motion vector and the list 0 motion vector of the current PU.
- the second MVD may indicate a difference between the motion vector of the candidate prediction motion vector and the list 1 motion vector of the current PU.
- the video encoder may select one or more of the candidate prediction motion vectors from the candidate prediction motion vector list (215).
- the video encoder may select one or more candidate prediction motion vectors in various ways. For example, a video encoder may select a candidate prediction motion vector with an associated motion vector that matches the motion vector to be encoded with minimal error, which may reduce the number of bits required to represent the motion vector difference for the candidate prediction motion vector.
- the video encoder may output one or more reference image indexes for the current PU, one or more candidate prediction motion vector indexes, and one or more selected candidate motion vectors.
- One or more motion vector differences of the predicted motion vector (216).
- the video encoder may output a reference picture index ("ref_idx_10") for List 0 or for Reference image index of list 1 ("ref_idx_11").
- the video encoder may also output a candidate prediction motion vector index (“mvp_10_flag") indicating the position of the selected candidate prediction motion vector for the list 0 motion vector of the current PU in the candidate prediction motion vector list.
- the video encoder may output a candidate prediction motion vector index (“mvp_11_flag”) indicating the position of the selected candidate prediction motion vector for the list 1 motion vector of the current PU in the candidate prediction motion vector list.
- the video encoder may also output a list 0 motion vector or a list 1 motion vector MVD for the current PU.
- the video encoder may output the reference picture index ("ref_idx_10") for List 0 and the list Reference image index of 1 ("ref_idx_11").
- the video encoder may also output a candidate prediction motion vector index (“mvp_10_flag") indicating the position of the selected candidate prediction motion vector for the list 0 motion vector of the current PU in the candidate prediction motion vector list.
- the video encoder may output a candidate prediction motion vector index (“mvp_11_flag”) indicating the position of the selected candidate prediction motion vector for the list 1 motion vector of the current PU in the candidate prediction motion vector list.
- the video encoder may also output the MVD of the list 0 motion vector for the current PU and the MVD of the list 1 motion vector for the current PU.
- FIG. 7 is an exemplary flowchart of motion compensation performed by a video decoder (such as video decoder 200) in an embodiment of the present application.
- the video decoder may receive an indication of the selected candidate prediction motion vector for the current PU (222). For example, the video decoder may receive a candidate prediction motion vector index indicating the position of the selected candidate prediction motion vector within the candidate prediction motion vector list of the current PU.
- the video decoder may receive the first candidate prediction motion vector index and the second candidate prediction motion vector index.
- the first candidate prediction motion vector index indicates the position of the selected candidate prediction motion vector for the list 0 motion vector of the current PU in the candidate prediction motion vector list.
- the second candidate prediction motion vector index indicates the position of the selected candidate prediction motion vector for the list 1 motion vector of the current PU in the candidate prediction motion vector list.
- a single syntax element may be used to identify two candidate prediction motion vector indexes.
- the video decoder may generate a list of candidate predicted motion vectors for the current PU (224).
- the video decoder may generate this candidate prediction motion vector list for the current PU in various ways.
- the video decoder may use the techniques described below with reference to FIGS. 8 to 12 to generate a list of candidate prediction motion vectors for the current PU.
- the video decoder may explicitly or implicitly set a reference image index identifying a reference image including a co-located PU, as described above Figure 5 describes this.
- the video decoder may determine the current PU's based on the motion information indicated by one or more selected candidate prediction motion vectors in the candidate prediction motion vector list for the current PU.
- Motion information (225). For example, if the motion information of the current PU is encoded using a merge mode, the motion information of the current PU may be the same as the motion information indicated by the selected candidate prediction motion vector. If the motion information of the current PU is encoded using the AMVP mode, the video decoder may use one or more MVDs indicated in the one or more motion vectors and the code stream indicated by the or the selected candidate prediction motion vector. To reconstruct one or more motion vectors of the current PU.
- the reference image index and prediction direction identifier of the current PU may be the same as the reference image index and prediction direction identifier of the one or more selected candidate prediction motion vectors.
- the video decoder may generate a predictive image block for the current PU based on one or more reference blocks indicated by the motion information of the current PU (226).
- FIG. 8 is an exemplary schematic diagram of a coding unit (CU) and an adjacent position image block associated with the coding unit (CU) in the embodiment of the present application, illustrating CU250 and schematic candidate prediction motion vector positions 252A to 252E associated with CU250 .
- This application may collectively refer to the candidate prediction motion vector positions 252A to 252E as the candidate prediction motion vector positions 252.
- the candidate prediction motion vector position 252 indicates a spatial candidate prediction motion vector in the same image as the CU 250.
- the candidate prediction motion vector position 252A is positioned to the left of CU250.
- the candidate prediction motion vector position 252B is positioned above the CU250.
- the candidate prediction motion vector position 252C is positioned at the upper right of CU250.
- the candidate prediction motion vector position 252D is positioned at the lower left of CU250.
- the candidate prediction motion vector position 252E is positioned at the upper left of the CU250.
- FIG. 8 is a schematic embodiment of a manner for providing an inter prediction module 121 and a motion compensation module 162 to generate a list of candidate prediction motion vectors. The embodiments will be explained below with reference to the inter prediction module 121, but it should be understood that the motion compensation module 162 may implement the same technique and thus generate the same candidate prediction motion vector list.
- FIG. 9 is an exemplary flowchart of constructing a candidate prediction motion vector list in an embodiment of the present application.
- the technique of FIG. 9 will be described with reference to a list including five candidate prediction motion vectors, but the techniques described herein may also be used with lists of other sizes.
- the five candidate prediction motion vectors may each have an index (eg, 0 to 4).
- the technique of FIG. 9 will be described with reference to a general video decoder.
- a general video decoder may be, for example, a video encoder (eg, video encoder 100) or a video decoder (eg, video decoder 200).
- the video decoder first considers four spatial candidate prediction motion vectors (902).
- the four spatial candidate prediction motion vectors may include candidate prediction motion vector positions 252A, 252B, 252C, and 252D.
- the four spatial candidate prediction motion vectors correspond to motion information of four PUs in the same image as the current CU (for example, CU250).
- the video decoder may consider the four spatial candidate prediction motion vectors in the list in a particular order. For example, the candidate prediction motion vector position 252A may be considered first. If the candidate prediction motion vector position 252A is available, the candidate prediction motion vector position 252A may be assigned to index 0.
- the video decoder may not include the candidate prediction motion vector position 252A in the candidate prediction motion vector list.
- Candidate prediction motion vector positions may be unavailable for various reasons. For example, if the candidate prediction motion vector position is not within the current image, the candidate prediction motion vector position may not be available. In another feasible implementation, if the candidate prediction motion vector position is intra-predicted, the candidate prediction motion vector position may not be available. In another feasible implementation, if the candidate prediction motion vector position is in a slice different from the current CU, the candidate prediction motion vector position may not be available.
- the video decoder may next consider the candidate prediction motion vector position 252B. If the candidate prediction motion vector position 252B is available and different from the candidate prediction motion vector position 252A, the video decoder may add the candidate prediction motion vector position 252B to the candidate prediction motion vector list.
- the terms "same” and “different” refer to motion information associated with candidate predicted motion vector locations. Therefore, two candidate prediction motion vector positions are considered the same if they have the same motion information, and are considered different if they have different motion information. If the candidate prediction motion vector position 252A is not available, the video decoder may assign the candidate prediction motion vector position 252B to index 0.
- the video decoder may assign the candidate prediction motion vector position 252 to index 1. If the candidate prediction motion vector position 252B is not available or the same as the candidate prediction motion vector position 252A, the video decoder skips the candidate prediction motion vector position 252B and does not include it in the candidate prediction motion vector list.
- the candidate prediction motion vector position 252C is similarly considered by the video decoder for inclusion in the list. If the candidate prediction motion vector position 252C is available and not the same as the candidate prediction motion vector positions 252B and 252A, the video decoder assigns the candidate prediction motion vector position 252C to the next available index. If the candidate prediction motion vector position 252C is unavailable or different from at least one of the candidate prediction motion vector positions 252A and 252B, the video decoder does not include the candidate prediction motion vector position 252C in the candidate prediction motion vector list. Next, the video decoder considers the candidate prediction motion vector position 252D.
- the video decoder assigns the candidate prediction motion vector position 252D to the next available index. If the candidate prediction motion vector position 252D is unavailable or different from at least one of the candidate prediction motion vector positions 252A, 252B, and 252C, the video decoder does not include the candidate prediction motion vector position 252D in the candidate prediction motion vector list.
- candidate prediction motion vectors 252A to 252D for inclusion in the candidate prediction motion vector list, but in some embodiments, all candidate prediction motion vectors 252A to 252D may be first added to the candidate A list of predicted motion vectors, with duplicates removed from the list of candidate predicted motion vectors later.
- the candidate prediction motion vector list may include four spatial candidate prediction motion vectors or the list may include less than four spatial candidate prediction motion vectors. If the list includes four spatial candidate prediction motion vectors (904, Yes), the video decoder considers temporal candidate prediction motion vectors (906).
- the temporal candidate prediction motion vector may correspond to motion information of a co-located PU of a picture different from the current picture. If a temporal candidate prediction motion vector is available and different from the first four spatial candidate prediction motion vectors, the video decoder assigns the temporal candidate prediction motion vector to index 4.
- the video decoder does not include the temporal candidate prediction motion vector in the candidate prediction motion vector list. Therefore, after the video decoder considers temporal candidate prediction motion vectors (906), the candidate prediction motion vector list may include five candidate prediction motion vectors (the first four spatial candidate prediction motion vectors considered at block 902 and the The temporal candidate prediction motion vector) or may include four candidate prediction motion vectors (the first four spatial candidate prediction motion vectors considered at block 902). If the candidate prediction motion vector list includes five candidate prediction motion vectors (908, Yes), the video decoder completes building the list.
- the video decoder may consider the fifth spatial candidate prediction motion vector (910).
- the fifth spatial candidate prediction motion vector may, for example, correspond to the candidate prediction motion vector position 252E. If the candidate prediction motion vector at position 252E is available and different from the candidate prediction motion vectors at positions 252A, 252B, 252C, and 252D, the video decoder may add a fifth spatial candidate prediction motion vector to the candidate prediction motion vector list.
- the five-space candidate prediction motion vector is assigned to index 4.
- the video decoder may not include the candidate prediction motion vector at position 252 in Candidate prediction motion vector list. So after considering the fifth spatial candidate prediction motion vector (910), the list may include five candidate prediction motion vectors (the first four spatial candidate prediction motion vectors considered at block 902 and the fifth spatial candidate prediction motion considered at block 910) Vector) or may include four candidate prediction motion vectors (the first four spatial candidate prediction motion vectors considered at block 902).
- the video decoder finishes generating the candidate prediction motion vector list. If the candidate prediction motion vector list includes four candidate prediction motion vectors (912, No), the video decoder adds artificially generated candidate prediction motion vectors (914) until the list includes five candidate prediction motion vectors (916, Yes).
- the video decoder may consider the fifth spatial candidate prediction motion vector (918).
- the fifth spatial candidate prediction motion vector may, for example, correspond to the candidate prediction motion vector position 252E. If the candidate prediction motion vector at position 252E is available and different from the candidate prediction motion vectors already included in the candidate prediction motion vector list, the video decoder may add a fifth spatial candidate prediction motion vector to the candidate prediction motion vector list, the The five-space candidate prediction motion vector is assigned to the next available index.
- the video decoder may not include the candidate prediction motion vector at position 252E in Candidate prediction motion vector list.
- the video decoder may then consider the temporal candidate prediction motion vector (920). If a temporal candidate prediction motion vector is available and different from the candidate prediction motion vectors already included in the candidate prediction motion vector list, the video decoder may add the temporal candidate prediction motion vector to the candidate prediction motion vector list, the temporal candidate The predicted motion vector is assigned to the next available index. If the temporal candidate prediction motion vector is not available or is not different from one of the candidate prediction motion vectors already included in the candidate prediction motion vector list, the video decoder may not include the temporal candidate prediction motion vector in the candidate prediction motion vector. List.
- the candidate prediction motion vector list includes five candidate prediction motion vectors (922, Yes)
- the video decoder finishes generating List of candidate prediction motion vectors. If the list of candidate prediction motion vectors includes less than five candidate prediction motion vectors (922, No), the video decoder adds artificially generated candidate prediction motion vectors (914) until the list includes five candidate prediction motion vectors (916, Yes) until.
- an additional merge candidate prediction motion vector may be artificially generated after the spatial candidate prediction motion vector and the temporal candidate prediction motion vector to fix the size of the merge candidate prediction motion vector list to a specified number of merge candidate prediction motion vectors (for example (Five of the previous possible implementations of FIG. 9).
- Additional merge candidate prediction motion vectors may include exemplary combined bi-predictive merge candidate prediction motion vectors (candidate prediction motion vector 1), scaled bi-directional predictive merge candidate prediction motion vectors (candidate prediction motion vector 2), and zero vectors Merge / AMVP candidate prediction motion vector (candidate prediction motion vector 3).
- FIG. 10 is an exemplary schematic diagram of adding a combined candidate motion vector to a merge mode candidate prediction motion vector list in an embodiment of the present application.
- the combined bi-directional predictive merge candidate prediction motion vector may be generated by combining the original merge candidate prediction motion vector.
- two candidate prediction motion vectors (which have mvL0 and refIdxL0 or mvL1 and refIdxL1) among the original candidate prediction motion vectors may be used to generate a bidirectional predictive merge candidate prediction motion vector.
- two candidate prediction motion vectors are included in the original merge candidate prediction motion vector list.
- the prediction type of one candidate prediction motion vector is List 0 unidirectional prediction
- the prediction type of the other candidate prediction motion vector is List 1 unidirectional prediction.
- mvL0_A and ref0 are picked from list 0
- mvL1_B and ref0 are picked from list 1
- a bidirectional predictive merge candidate prediction motion vector (which has mvL0_A and ref0 in list 0 and MvL1_B and ref0) in Listing 1 and check whether it is different from the candidate prediction motion vectors that have been included in the candidate prediction motion vector list. If it is different, the video decoder may include the bi-directional predictive merge candidate prediction motion vector in the candidate prediction motion vector list.
- FIG. 11 is an exemplary schematic diagram of adding a scaled candidate motion vector to a merge mode candidate prediction motion vector list in an embodiment of the present application.
- the scaled bi-directional predictive merge candidate prediction motion vector may be generated by scaling the original merge candidate prediction motion vector.
- a candidate prediction motion vector (which may have mvLX and refIdxLX) from the original candidate prediction motion vector may be used to generate a bidirectional predictive merge candidate prediction motion vector.
- two candidate prediction motion vectors are included in the original merge candidate prediction motion vector list.
- the prediction type of one candidate prediction motion vector is List 0 unidirectional prediction
- the prediction type of the other candidate prediction motion vector is List 1 unidirectional prediction.
- mvL0_A and ref0 may be picked from list 0, and ref0 may be copied to the reference index ref0 ′ in list 1. Then, mvL0′_A may be calculated by scaling mvL0_A with ref0 and ref0 ′. The scaling may depend on the POC distance.
- a bi-directional predictive merge candidate prediction motion vector (which has mvL0_A and ref0 in list 0 and mvL0'_A and ref0 'in list 1) can be generated and checked if it is a duplicate. If it is not duplicate, it can be added to the merge candidate prediction motion vector list.
- FIG. 12 is an exemplary schematic diagram of adding a zero motion vector to a merge mode candidate prediction motion vector list in an embodiment of the present application.
- the zero vector merge candidate prediction motion vector may be generated by combining the zero vector with a reference index that can be referred to. If the zero vector candidate prediction motion vector is not duplicated, it can be added to the merge candidate prediction motion vector list. For each generated merge candidate prediction motion vector, the motion information may be compared with the motion information of the previous candidate prediction motion vector in the list.
- the generated candidate prediction motion vector is added to the merge candidate prediction motion vector. List.
- the process of determining whether the candidate prediction motion vector is different from the candidate prediction motion vector already included in the candidate prediction motion vector list is sometimes referred to as pruning.
- each newly generated candidate prediction motion vector can be compared with existing candidate prediction motion vectors in the list.
- the pruning operation may include comparing one or more new candidate prediction motion vectors with candidate prediction motion vectors already in the candidate prediction motion vector list and not adding as candidates already in the candidate prediction motion vector list. Repeated new candidate prediction motion vector for prediction motion vector.
- the pruning operation may include adding one or more new candidate prediction motion vectors to a list of candidate prediction motion vectors and removing duplicate candidate prediction motion vectors from the list later. It should be understood that, in other feasible implementation manners, the foregoing trimming step may not be performed.
- the spatial candidate prediction mode is exemplified from five positions 252A to 252E shown in FIG. .
- the spatial candidate prediction mode may further include, for example, within a preset distance from the image block to be processed. , But not adjacent to the image block to be processed. Exemplarily, such positions may be shown as 252F to 252J in FIG. 13.
- FIG. 13 is an exemplary schematic diagram of a coding unit and an adjacent position image block associated with the coding unit in the embodiment of the present application. The positions described in the image blocks that are in the same image frame as the image block to be processed and that have been reconstructed when the image block to be processed is not adjacent to the image block to be processed are within the range of such positions.
- FIG. 14 is an exemplary flowchart of a method for predicting motion information according to an embodiment of the present application. Specifically, the method includes the following steps:
- the target pixel point includes a first candidate pixel point adjacent to the image block to be processed and a second candidate pixel point located on the left side of the image block to be processed and not adjacent to the image block to be processed.
- FIG. 8 schematically shows the adjacent positions 252A-252E of the coding unit 250
- FIG. 13 schematically shows the non-adjacent positions 252F-252J of the coding end element 250.
- the foregoing position may be used to indicate both an image block covering the position and a pixel point at the position.
- the image block to be processed in the embodiment of the present application is a set of pixels to be processed, and is not limited to a coding unit, a coding subunit, or a prediction unit.
- the coding unit 250 may be used as an embodiment in the present application. Of pending image blocks.
- the left side of the to-be-processed image block includes the left side of the to-be-processed image block (such as the position corresponding to 252A), it can also include the upper left (such as the position corresponding to 252D), and it can also include the lower left (such as the corresponding position of 252E) s position).
- the position of the pixel point at the upper left vertex of the image block to be processed is set as the origin, and the straight line where the upper edge of the image block to be processed is located is The horizontal axis, the right direction is the horizontal positive direction, the straight line where the left edge of the image block to be processed is located is the vertical axis, and the vertical coordinate system is the vertical positive direction.
- the position of the second candidate pixel point in the embodiment of the present invention may be at least one of the following coordinate points in the coordinate system: (-1, h ⁇ i-1 + h), (-1, h ⁇ i + h), (-w ⁇ i, h ⁇ j-1), (-w ⁇ i-1, h ⁇ j-1), (-w ⁇ i, h ⁇ j), (-w ⁇ i-1, h ⁇ j ),
- w and h are preset positive integers
- i is a positive integer
- j is a non-negative integer.
- the position of the second candidate pixel point may not include the upper left position of the image block to be processed, that is, the position of the second candidate pixel point may be at least one of the following coordinate points in the coordinate system: (-1, h ⁇ i-1 + h), (-1, h ⁇ i + h), (-w ⁇ i, h ⁇ j + h-1), (-w ⁇ i-1, h ⁇ j + h-1), (-w ⁇ i, h ⁇ j), (-w ⁇ i-1, h ⁇ j), where w and h are preset positive integers, i is a positive integer, and j is non-negative Integer.
- w is the width of the image block to be processed
- h is the height of the image block to be processed.
- the motion information of the image block is determined, it is stored in a motion vector matrix for use in processing subsequent image blocks.
- the entire frame of image can be corresponding to a pixel unit set with a 4x4 pixel point set as a pixel unit, and each 4x4 pixel point set corresponds to a motion information, and then the motion information corresponding to each 4x4 pixel point can be extracted to form a sum
- the motion information matrix corresponding to the original image can also be called a motion vector field.
- the above process is referred to as a motion vector field in the embodiment of the present application by sampling the motion information matrix corresponding to the image where the image block to be processed is located, where w is the sampling width interval of the motion vector field, and h is the The sampling height interval of the motion vector field. It should be understood that, in this embodiment, the determination of w is independent of the width of the image block to be processed, and the determination of h is independent of the height of the image block to be processed.
- this step includes: determining a coding unit of a previously reconstructed image block of the image block to be processed, which is located on the left side of the image block to be processed and is different from the image block to be processed Adjacent pixels.
- the pixel point at the lower left corner of the image block to be processed may be used as a reference point, and the straight line at the lower edge of the image block to be processed is used as a reference straight line. Determining one or more anchor points located to the left of the reference point, the anchor points being located on the reference straight line. Determine the coding unit (or prediction unit) where the anchor point is located, and target at least one of the adjacent point above the upper left corner point of the coding unit and the adjacent point above the upper right corner point of the coding unit as a target pixel.
- a plurality of derived straight lines parallel to the reference straight line are sequentially determined according to a preset step size, and the derived straight lines are located below the image block to be processed. Taking the derived straight line as a new reference straight line, and taking the intersection of the derived straight line and the straight line at the left edge of the image block to be processed as a new reference point, repeating the steps of determining a target pixel point to obtain at least one new target pixel.
- the target pixel point is no longer obtained repeatedly.
- this embodiment determines the positions of at least two target pixel points having a preset position relationship with the image block to be processed, and the order of acquisition is based on the following description of the preset order.
- the position range of the second candidate pixel point needs to be limited.
- the abscissa of the second candidate pixel point cannot exceed one
- the boundary value, that is, w ⁇ i is less than or equal to the first threshold.
- the first threshold is equal to a width of a coding tree unit CTU in which the image block to be processed is located, or the first threshold is equal to twice the width of the CTU.
- the embodiment of the present application does not limit the position of the first candidate pixel point adjacent to the image block to be processed.
- it may be a point at any one or more positions indicated by 252A-252E in FIG. 8.
- the acquiring at least two target pixel points having a preset position relationship with the image block to be processed includes: acquiring in accordance with the preset order A plurality of second candidate pixel points among the at least two target pixel points.
- variable-length encoding when variable-length encoding is performed on the index information of the second candidate pixel point, the order of acquisition is related to the bit consumption of the encoded index information, for example, the number of bits of the index information of the second candidate pixel point that is previously acquired is encoded. Less than or equal to the number of bits of the index information of the second candidate pixel point that is acquired later, that is, when the second candidate pixel point that was previously acquired corresponds to the target motion information, the binary of the target identification information The length of the representation is P. When the second candidate pixel point obtained later corresponds to the target motion information, the length of the binary representation of the target identification information is Q, and P is less than or equal to Q.
- the binary representation may be an encoded codeword, and the length of the binary representation is the codeword length of the encoded codeword.
- the preset order includes a short-to-long distance order, where the distance is an absolute value of a horizontal coordinate of the second candidate pixel point in the rectangular coordinate system and The sum of the absolute values of the vertical coordinates; or, the order from right to left; or, the order from top to bottom; or the polyline order from top to bottom.
- the distance is a length of a straight line segment connecting the second candidate pixel point and a pixel point at a vertex position of a lower left corner of the image block to be processed.
- FIG. 16 is a schematic diagram from right to left, which is obtained row by row from top to bottom, and is obtained one by one from the right to left in the row.
- FIG. 17 is a schematic diagram from top to bottom, which follows The order from right to left is obtained column by column, and the columns are obtained one by one from top to bottom.
- Figure 18 is a schematic diagram of the sequence from top right to bottom left. The numbers in the figure represent the order of acquisition. The higher the order of acquisition, the higher the number. The smaller. It should be understood that when the second candidate pixel point at a certain position is not acquired, the position is skipped, and the other positions are still obtained according to the smaller number and the higher the order.
- the obtained position of the target pixel point or the motion information corresponding to the target pixel point is sent to a candidate motion information list.
- the specific implementation manner may be the same as the construction method of the candidate motion vector list used in the Merge or AMVP mode in the aforementioned H.265 standard technology.
- a trimming operation is performed, that is, the acquiring at least two target pixel points having a preset position relationship with an image block to be processed includes: acquiring in sequence Candidate pixel points having the preset positional relationship with the image block to be processed; determining that the currently acquired motion information of the candidate pixel points is different from the acquired motion information of the target pixel points; The candidate pixel point of the motion information is used as the target pixel point.
- the pruning operation in the foregoing, and will not be described again.
- the position of the target pixel point or the motion information corresponding to the target pixel point may be directly sent to the candidate motion information list, that is, no trimming operation is performed.
- the obtained at least two Among the target pixels there is a case where the motion information of at least two target pixels is the same, and there may also be a case where the motion information of any two target pixels is different.
- the number of the obtained target pixel points may be limited to a preset second threshold.
- the selection of the second threshold may be determined according to a specific implementation. For example, reasonably determine the second threshold to ensure that the number of target pixel points obtained is a fixed value, or to ensure that the number of target pixel points added to the candidate motion information list is a fixed value, or to ensure that The total amount of motion information is a fixed value.
- the target identification information is used to determine target motion information from the motion information corresponding to the at least two target pixel points, and when the first candidate pixel point corresponds to the target motion information, the target identification information
- the length of the binary representation of the information is N.
- the length of the binary representation of the target identification information is M, and N is less than or equal to M.
- the target identification information may be an index used to indicate each piece of motion information in the candidate motion information list, and different motion information is distinguished by different index numbers. Different index numbers have different binary representations, and the binary representations can be encoded codewords. In the embodiment of the present application, the length of the encoded codeword corresponding to the index number of the first candidate pixel point is less than or equal to the length of the encoded codeword corresponding to the index number of the second candidate pixel point.
- the embodiment of the present application adds new candidate motion information. Therefore, the embodiment of the present application can be used to improve the Merge technology or the AMVP technology.
- the target motion information may be used as the motion information of the image block to be processed. Similar to the AMVP technology, the target motion information may be used as a prediction value of the motion information of the image block to be processed. By combining the difference with the motion information, the motion information of the image block to be processed is obtained.
- the method may be used to decode an image block to be processed.
- the method further includes: analyzing a code stream to obtain target motion residual information; and correspondingly, predicting the image block to be processed based on the target motion information.
- the motion information includes: combining the target motion information and the target motion residual information to obtain motion information of the image block to be processed.
- the motion information in the embodiment of the present application may refer to a motion vector.
- This step is to add the predicted value of the motion vector of the image block to be processed indicated by the target identification information and the residual value of the motion vector obtained by analysis to obtain the to-be-processed Image block motion vector.
- the obtaining target identification information includes: parsing the code stream to obtain the target identification information.
- the method may be used to encode the target motion information; before obtaining the target identification information, the method further includes: determining a combination of the target motion information and the target motion residual information with the least encoding cost; and correspondingly, the acquiring
- the target identification information includes: obtaining identification information of the target motion information with the least coding cost among the at least two target motion information.
- the method further includes encoding the acquired target identification information and encoding the target motion residual information.
- FIG. 31 is another exemplary flowchart of a motion information prediction method according to an embodiment of the present application. Specifically, the method includes the following steps:
- the target pixel point includes a candidate pixel point located on the left side of the image block to be processed and not adjacent to the image block to be processed, and when the prediction mode of the image block where the target pixel point is located is a frame During intra prediction, the target pixel point is unavailable.
- the judgment of availability will be based on factors such as the prediction mode of the image block where the target pixel is located, whether the target pixel is within the image region, whether the motion vector corresponding to the position indicated by the target pixel is necessarily the same as the motion vector corresponding to other positions ( For example, in the H.265 standard, for the rectangular block mode, the candidate prediction block of the Merge mode is determined).
- the prediction mode of the image block where the target pixel point is located is inter prediction
- the target pixel point is available, but when the target pixel point is located at the image block to be processed When the image is outside the edge of the image or the edge of the strip, the target pixel is also unavailable.
- the preset positional relationship may include an adjacent positional relationship with the image block to be processed and a non-adjacent positional relationship.
- FIG. 8 and FIG. 13 respectively.
- the second candidate pixel point located on the left side of the image block to be processed and not adjacent to the image block to be processed is discussed in detail.
- the target pixel point The second candidate pixel point in the embodiment shown in FIG. 14 is included, and details are not described herein again.
- the availability of the target pixel point may be determined, that is, the prediction mode of the image block corresponding to the coordinates of the target pixel point is checked.
- the availability of the image block where the target pixel point is located may be determined.
- the prediction mode of the image block is checked, the The coordinates indicate the prediction mode of the corresponding image block.
- This point can be the upper left corner point of the image block, the center point of the image block, or the target pixel point, which is not limited.
- the condition for determining availability includes: when the prediction mode of the image block where the target pixel is located is intra prediction, the target pixel is unavailable. It should be understood that when the position of the target pixel is outside the edge of the image, or outside the edge of the band, the target pixel does not actually exist, or the value of the target pixel is derived from the derivation and cannot be measured. In a feasible implementation manner, the target pixel point is also considered unavailable.
- the position of the candidate pixel point includes: taking a position of a pixel point at an upper left vertex of the image block to be processed as an origin point, and a position where an upper edge of the image block to be processed is located
- the straight line is the horizontal axis
- the right is the horizontal positive direction
- the straight line where the left edge of the image block to be processed is located is the vertical axis
- the downward is at least one of the following coordinate points in the orthogonal coordinate system in the vertical positive direction.
- w is the width of the image block to be processed
- h is the height of the image block to be processed.
- the motion vector field is obtained by sampling a motion information matrix corresponding to an image where the image block to be processed is located, w is a sampling width interval of the motion vector field, and h is the motion vector.
- the sampling height interval of the field is obtained by sampling a motion information matrix corresponding to an image where the image block to be processed is located, w is a sampling width interval of the motion vector field, and h is the motion vector. The sampling height interval of the field.
- w ⁇ i is less than or equal to the first threshold.
- the first threshold value is equal to a width of a coding tree unit CTU where the image block to be processed is located, or the first threshold value is equal to twice the width of the CTU.
- the candidate motion information set includes: adding motion information corresponding to a plurality of available candidate pixel points to the candidate motion information set of the image block to be processed according to a preset order, wherein when the previously obtained candidate pixel points When corresponding to the target motion information, the length of the binary representation of the target identification information is P. When the candidate pixel points obtained later correspond to the target motion information, the length of the binary representation of the target identification information is Q, P is less than or equal to Q.
- the binary representation of the target identification information includes an encoded codeword of the target identification information.
- the preset order includes a short-to-long distance order, wherein the distance is a horizontal coordinate absolute value and a vertical value of the candidate pixel point in the rectangular coordinate system.
- the distance is a length of a straight line segment connecting the second candidate pixel point and a pixel point at a vertex position of a lower left corner of the image block to be processed.
- the candidate motion information set includes at least two identical motion information.
- adding the available motion information corresponding to the target pixel point to the candidate motion information set of the image block to be processed includes: sequentially obtaining the available target pixel point; determining a current The obtained motion information of the available target pixel points is different from the motion information in the candidate motion information set of the image block to be processed; adding the available target pixel points with different motion information to the image block to be processed Candidate motion information set.
- the number of motion information in the candidate motion information set is less than or equal to a preset second threshold.
- the target identification information is used to determine target motion information from the candidate motion information set.
- the target identification information may be an index used to indicate each piece of motion information in the candidate motion information list, and different motion information is distinguished by different index numbers.
- the predicting the motion information of the image block to be processed according to the target motion information includes: using the target motion information as the motion information of the image block to be processed.
- the method is used to decode the image block to be processed, further comprising: analyzing a code stream to obtain target motion residual information; and correspondingly, predicting the target motion information based on the target motion information.
- the motion information of the image block to be processed includes: combining the target motion information and the target motion residual information to obtain the motion information of the image block to be processed.
- the obtaining target identification information includes: parsing the code stream to obtain the target identification information.
- the method is used to encode the image block to be processed, and before the obtaining the target identification information, the method further includes: determining a combination of target motion information and target motion residual information with the least coding cost.
- the obtaining target identification information includes: obtaining identification information of the target motion information with the least coding cost among the at least two target motion information.
- the method further includes encoding the acquired target identification information.
- the manner further includes encoding the target motion residual information.
- FIG. 32 is an exemplary structural block diagram of a motion information prediction device 3200 according to an embodiment of the present application, and specifically includes the following modules:
- An obtaining module 3201 is configured to obtain at least two target pixel points having a preset positional relationship with an image block to be processed, where the target pixel points include a first candidate pixel point adjacent to the image block to be processed and the target pixel point located at the to-be-processed image block. Processing a second candidate pixel point on the left side of the image block and not adjacent to the image block to be processed;
- the indexing module 3202 is configured to obtain target identification information, where the target identification information is used to determine target motion information from motion information corresponding to the at least two target pixel points, and when the first candidate pixel point corresponds to the For target motion information, the length of the binary representation of the target identification information is N. When the second candidate pixel corresponds to the target motion information, the length of the binary representation of the target identification information is M, and N is less than or Equal to M
- a calculation module 3203 is configured to predict motion information of the image block to be processed according to the target motion information.
- the binary representation of the target identification information includes an encoded codeword of the target identification information.
- the position of the second candidate pixel point includes: using a position of a pixel point at an upper left vertex of the image block to be processed as an origin, and using an upper position of the image block to be processed
- the line where the edge is located is the horizontal axis, the right is the horizontal positive direction, the line where the left edge of the image block to be processed is located is the vertical axis, and the downward is the following coordinate points in the orthogonal coordinate system in the vertical positive direction
- w is the width of the image block to be processed
- h is the height of the image block to be processed.
- the motion vector field is obtained by sampling a motion information matrix corresponding to an image where the image block to be processed is located, w is a sampling width interval of the motion vector field, and h is the motion vector.
- the sampling height interval of the field is obtained by sampling a motion information matrix corresponding to an image where the image block to be processed is located, w is a sampling width interval of the motion vector field, and h is the motion vector. The sampling height interval of the field.
- w ⁇ i is less than or equal to the first threshold.
- the first threshold value is equal to a width of a coding tree unit CTU where the image block to be processed is located, or the first threshold value is equal to twice the width of the CTU.
- the obtaining module 2001 is specifically configured to obtain multiple second ones of the at least two target pixels in the preset order.
- Candidate pixel points wherein when the previously obtained second candidate pixel point corresponds to the target motion information, the length of the binary representation of the target identification information is P, and when the later obtained second candidate pixel When the point corresponds to the target motion information, the length of the binary representation of the target identification information is Q, and P is less than or equal to Q;
- the preset order includes a short-to-long distance order, where the distance is an absolute value of a horizontal coordinate of the second candidate pixel point in the rectangular coordinate system and The sum of the absolute values of the vertical coordinates; or, the order from right to left; or, the order from top to bottom; or the polyline order from top to bottom.
- the distance is a length of a straight line segment connecting the second candidate pixel point and a pixel point at a vertex position of a lower left corner of the image block to be processed.
- motion information of at least two target pixel points is the same.
- the obtaining module 3201 is specifically configured to sequentially obtain candidate pixel points having the preset positional relationship with the image block to be processed, and determine a motion of the currently obtained candidate pixel points.
- the information is different from the obtained motion information of the target pixel point; the candidate pixel point having different motion information is used as the target pixel point.
- the number of the obtained target pixel points is a preset second threshold.
- the calculation module 3203 is specifically configured to use the target motion information as the motion information of the image block to be processed.
- the device 3200 is configured to decode the image block to be processed, and the indexing module 3202 is further configured to parse a code stream to obtain target motion residual information; correspondingly, the calculation module 3203 is specifically configured to: combine the target motion information and the target motion residual information to obtain motion information of the image block to be processed.
- the indexing module 3202 is specifically configured to: parse the code stream to obtain the target identification information.
- the device 3200 is configured to encode the image block to be processed, and the obtaining module 3201 is further configured to: determine a combination of target motion information and target motion residual information with a minimum coding cost; corresponding
- the indexing module 3202 is specifically configured to obtain identification information of the target motion information with the least coding cost among the at least two target motion information.
- the indexing module 3202 is further configured to encode the obtained target identification information.
- the indexing module 3202 is further configured to: encode the target motion residual information.
- FIG. 33 is another exemplary structural block diagram of the motion information prediction device 3300 in the embodiment of the present application, and specifically includes the following modules:
- a detection module 3301 is configured to determine availability of at least one target pixel point having a preset positional relationship with an image block to be processed, where the target pixel point is located on a left side of the image block to be processed and is related to the image block to be processed Non-adjacent candidate pixels, wherein when the prediction mode of the image block where the target pixel is located is intra prediction, the target pixel is unavailable;
- An obtaining module 3302 configured to add available motion information corresponding to the target pixel point to a candidate motion information set of the image block to be processed;
- An indexing module 3303 configured to obtain target identification information, where the target identification information is used to determine target motion information from the candidate motion information set;
- a calculation module 3304 is configured to predict motion information of the image block to be processed according to the target motion information.
- the detection module 3301 is specifically configured to determine availability of an image block where the target pixel point is located.
- the position of the candidate pixel point includes: taking a position of a pixel point at an upper left vertex of the image block to be processed as an origin point, and a position where an upper edge of the image block to be processed is located
- the straight line is the horizontal axis
- the right is the horizontal positive direction
- the straight line where the left edge of the image block to be processed is located is the vertical axis
- the downward is at least one of the following coordinate points in the orthogonal coordinate system in the vertical positive direction : (-1, h ⁇ i-1 + h), (-1, h ⁇ i + h), (-w ⁇ i, h ⁇ j-1), (-w ⁇ i-1, h ⁇ j- 1), (-w ⁇ i, h ⁇ j), (-w ⁇ i-1, h ⁇ j), where w and h are preset positive integers, i is a positive integer, and j is a non-negative integer.
- w is the width of the image block to be processed
- h is the height of the image block to be processed.
- the motion vector field is obtained by sampling a motion information matrix corresponding to an image where the image block to be processed is located, w is a sampling width interval of the motion vector field, and h is the motion vector.
- the sampling height interval of the field is obtained by sampling a motion information matrix corresponding to an image where the image block to be processed is located, w is a sampling width interval of the motion vector field, and h is the motion vector. The sampling height interval of the field.
- w ⁇ i is less than or equal to the first threshold.
- the first threshold value is equal to a width of a coding tree unit CTU where the image block to be processed is located, or the first threshold value is equal to twice the width of the CTU.
- the obtaining module 3302 is specifically configured to: according to a preset order, use the multiple candidate pixel points that are available.
- the motion information corresponding to the points is added to the candidate motion information set of the image block to be processed, wherein when the previously obtained candidate pixel points correspond to the target motion information, the length of the binary representation of the target identification information is P When the candidate pixel points obtained later correspond to the target motion information, the length of the binary representation of the target identification information is Q, and P is less than or equal to Q.
- the binary representation of the target identification information includes an encoded codeword of the target identification information.
- the preset order includes a short-to-long distance order, where the distance is a horizontal coordinate absolute value and a vertical value of a second candidate pixel point in the rectangular coordinate system.
- the distance is a length of a straight line segment connecting the second candidate pixel point and a pixel point at a vertex position of a lower left corner of the image block to be processed.
- the candidate motion information set includes at least two identical motion information.
- the obtaining module 3302 is specifically configured to sequentially obtain the available target pixel points; determine the currently obtained motion information of the available target pixel points and the The motion information in the candidate motion information set is different; the available target pixels with different motion information are added to the candidate motion information set of the image block to be processed.
- the number of motion information in the candidate motion information set is less than or equal to a preset second threshold.
- the calculation module 3304 is specifically configured to use the target motion information as the motion information of the image block to be processed.
- the device 3300 is configured to decode the image block to be processed, and the indexing module 3303 is further configured to parse a code stream to obtain target motion residual information; correspondingly, the calculation module 3104 is specifically configured to: combine the target motion information and the target motion residual information to obtain motion information of the image block to be processed.
- the indexing module 3303 is specifically configured to: parse the code stream to obtain the target identification information.
- the device 3300 is configured to encode the image block to be processed, and the obtaining module 3302 is further configured to: determine a combination of target motion information and target motion residual information with the least coding cost; corresponding
- the indexing module 3303 is specifically configured to obtain identification information of the target motion information with the least coding cost among the at least two target motion information.
- the indexing module 3303 is further configured to: encode the obtained target identification information.
- the indexing module 3303 is further configured to: encode the target motion residual information.
- FIG. 34 is a schematic structural block diagram of a motion information prediction device 3400 in an embodiment of the present application. Specifically, it includes: a processor 3401 and a memory 3402 coupled to the processor; the processor 3401 is configured to execute the embodiment shown in FIG. 14 or FIG. 32 and various feasible implementation manners.
- the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over a computer-readable medium as one or more instructions or code, and executed by a hardware-based processing unit.
- the computer-readable medium may include a computer-readable storage medium or a communication medium, the computer-readable storage medium corresponding to a tangible medium such as a data storage medium, and the communication medium includes a computer program that facilitates, for example, transmission from one place to another according to a communication protocol Any media.
- computer-readable media may illustratively correspond to (1) non-transitory, tangible computer-readable storage media, or (2) a communication medium such as a signal or carrier wave.
- a data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code, and / or data structures used to implement the techniques described in this application.
- the computer program product may include a computer-readable medium.
- the computer-readable storage medium may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage devices, magnetic disk storage devices or other magnetic storage devices, flash memory, or may be used to store rendering instructions. Or any other medium in the form of a data structure with the desired code and accessible by a computer. Also, any connection is properly termed a computer-readable medium.
- coaxial cable fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technology such as infrared, radio, and microwave
- DSL digital subscriber line
- coaxial Cables, fiber optic cables, twisted pairs, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of media.
- computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transient media, but are instead directed to non-transitory, tangible storage media.
- magnetic disks and optical discs include compact discs (CDs), laser discs, optical discs, digital versatile discs (DVDs), flexible disks and Blu-ray discs, where disks typically reproduce data magnetically, and optical discs pass through The data is reproduced optically. Combinations of the above should also be included within the scope of computer-readable media.
- DSPs digital signal processors
- ASICs application specific integrated circuits
- FPGAs field programmable gate arrays
- processors may refer to any of the aforementioned structures or any other structure suitable for implementing the techniques described herein.
- functionality described herein may be provided within dedicated hardware and / or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.
- the techniques of this application can be implemented in a wide variety of devices or devices, including wireless handsets, integrated circuits (ICs), or collections of ICs (eg, chipset).
- ICs integrated circuits
- collections of ICs eg, chipset
- Various components, modules, or units are described in this application to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily need to be implemented by different hardware units. More specifically, as described above, various units may be combined in a codec hardware unit or by interoperable hardware units (including one or more processors as described above) combined with appropriate software and / or firmware To provide.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
A method for predicting motion information concerning an image block. Said method comprises: determining the availability of at least one target pixel point having a preset positional relationship with an image block to be processed, the target pixel point comprising candidate pixel points that are located at the left side of said image block and are not adjacent to said image block, and when a prediction mode of the image block where the target pixel point is located is intra-frame prediction, the target pixel point being unavailable; adding motion information corresponding to the available target pixel point into a candidate motion information set of said image block to be processed; acquiring target identification information, the target identification information being used to determine target motion information from the candidate motion information set; and predicting, according to the target motion information, the motion information concerning said image block.
Description
本申请涉及视频图像技术领域,尤其涉及一种帧间预测的方法及装置。The present application relates to the technical field of video images, and in particular, to a method and a device for inter prediction.
数字视频能力可并入到大范围的装置中,包含数字电视、数字直播系统、无线广播系统、个人数字助理(personal digital assistant,PDA)、膝上型或桌上型计算机、平板计算机、电子书阅读器、数码相机、数字记录装置、数字媒体播放器、视频游戏装置、视频游戏控制台、蜂窝式或卫星无线电电话、视频会议装置、视频流装置等等。数字视频装置实施视频压缩技术,例如由MPEG-2、MPEG-4、ITU-TH.263、ITU-TH.264/MPEG-4第10部分高级视频编解码(advanced video coding,AVC)、ITU-TH.265高效率视频编解码(high efficiency video coding,HEVC)标准定义的标准和所述标准的扩展部分中所描述的那些视频压缩技术,从而更高效地发射及接收数字视频信息。视频装置可通过实施这些视频编解码技术来更高效地发射、接收、编码、解码和/或存储数字视频信息。Digital video capabilities can be incorporated into a wide range of devices, including digital television, digital live broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, e-books Reader, digital camera, digital recording device, digital media player, video game device, video game console, cellular or satellite radio telephone, video conference device, video streaming device, etc. Digital video equipment implements video compression technologies, such as advanced video coding (AVC), ITU-TH.263, ITU-TH.264 / MPEG-4 Part 10. TH.265 high-efficiency video coding (HEVC) standard defines standards and those video compression technologies described in the extensions of the standard to more efficiently transmit and receive digital video information. Video devices can implement these video codec technologies to more efficiently transmit, receive, encode, decode, and / or store digital video information.
视频压缩技术执行空间(图像内)预测和/或时间(图像间)预测,以减少或移除视频序列中固有的冗余。对于基于块的视频解码,可将视频块分割成视频块,视频块还可被称作树块、编码单元(coding unit,CU)和/或解码节点。使用关于同一图像中的相邻块中的参考样本的空间预测来编码图像的帧内解码(I)条带中的视频块。图像的帧间解码(P或B)条带中的视频块可使用关于同一图像中的相邻块中的参考样本的空间预测或关于其它参考图像中的参考样本的时间预测。图像可被称作帧,且参考图像可被称作参考帧。Video compression techniques perform spatial (intra-image) prediction and / or temporal (inter-image) prediction to reduce or remove redundancy inherent in video sequences. For block-based video decoding, a video block may be divided into video blocks, and a video block may also be referred to as a tree block, a coding unit (coding unit, CU), and / or a decoding node. Video blocks in an intra-coded (I) slice of an image are encoded using spatial prediction about reference samples in neighboring blocks in the same image. Video blocks in an inter-decoded (P or B) slice of an image may use spatial predictions about reference samples in neighboring blocks in the same image or temporal predictions about reference samples in other reference images. An image may be referred to as a frame, and a reference image may be referred to as a reference frame.
发明内容Summary of the invention
本申请实施例提供一种帧间预测的方法及装置,选择合适的候选运动信息作为待处理图像块的运动信息预测值,提高了运动信息预测的有效性,提高了编解码效率。The embodiments of the present application provide a method and an apparatus for inter prediction. Selecting suitable candidate motion information as a motion information prediction value of an image block to be processed improves the effectiveness of motion information prediction and the encoding and decoding efficiency.
应理解,一般的,运动信息包括运动矢量和运动矢量指向的参考帧的索引信息等。在本申请实施例中的一种可行的实施方式中,对运动信息的预测指对运动矢量的预测。It should be understood that, generally, the motion information includes motion vectors and index information of a reference frame pointed to by the motion vectors, and the like. In a feasible implementation manner in the embodiment of the present application, the prediction of the motion information refers to the prediction of a motion vector.
在本申请实施例的第一方面,提供了一种图像块的运动信息的预测方法,包括:获取与待处理图像块具有预设位置关系的至少两个目标像素点,所述目标像素点包括与所述待处理图像块邻接的第一候选像素点和位于所述待处理图像块的左侧且与所述待处理图像块不邻接的第二候选像素点;获取目标标识信息,所述目标标识信息用于从所述至少两个目标像素点对应的运动信息中确定目标运动信息,其中,当所述第一候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为N,当所述第二候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为M,N小于或等于M;根据所述目标运动信息,预测所述待处理图像块的运动信息。In a first aspect of the embodiments of the present application, a method for predicting motion information of an image block is provided, including: obtaining at least two target pixel points having a preset position relationship with an image block to be processed, where the target pixel points include A first candidate pixel point adjacent to the image block to be processed and a second candidate pixel point located on the left side of the image block to be processed and not adjacent to the image block to be processed; obtaining target identification information, the target The identification information is used to determine target motion information from the motion information corresponding to the at least two target pixel points, and when the first candidate pixel point corresponds to the target motion information, a binary representation of the target identification information The length is N. When the second candidate pixel corresponds to the target motion information, the length of the binary representation of the target identification information is M, and N is less than or equal to M; according to the target motion information, predicting the Process the motion information of the image block.
该实施方式的有益效果在于采用待处理块左侧非邻接图像块的运动信息作为待处理块的候选运动信息,利用了更多的空域先验编码信息,提高了编码性能。The beneficial effect of this implementation mode is that motion information of a non-adjacent image block on the left side of a block to be processed is used as candidate motion information of the block to be processed, and more spatially prior coding information is used to improve coding performance.
在一种可行的实施方式中,所述目标标识信息的二进制表示包括所述目标标识信息的编码码字。In a feasible implementation manner, the binary representation of the target identification information includes an encoded codeword of the target identification information.
该实施方式的有益效果在于各候选预测运动信息的表示方式采用变长编码方式时,顺序靠前的运动信息会使用较短的码字编码,顺序靠后的运动信息会使用较长的码字编码。按照目标像素点的运动信息与待处理图像块的运动信息的相关性,恰当地决定目标像素点的获取顺序有利于选择更好的码字编码策略,提高编码性能。The beneficial effect of this implementation mode is that when the candidate prediction motion information is expressed in a variable-length encoding method, the motion information in the upper order will be encoded with a shorter codeword, and the motion information in the lower order will use a longer codeword coding. According to the correlation between the motion information of the target pixel and the motion information of the image block to be processed, the proper determination of the acquisition order of the target pixel is helpful for selecting a better codeword encoding strategy and improving the encoding performance.
在一种可行的实施方式中,所述第二候选像素点的位置,包括:在以所述待处理图像块的左上顶点处的像素点的位置为原点,以所述待处理图像块的上边缘所在的直线为横轴,向右为水平正方向,以所述待处理图像块的左边缘所在的直线为纵轴,向下为竖直正方向的直角坐标系中的以下坐标点中的至少一个:(-1,h×i-1+h),(-1,h×i+h),(-w×i,h×j-1),(-w×i-1,h×j-1),(-w×i,h×j),(-w×i-1,h×j),其中,w和h为预设正整数,i为正整数,j为非负整数。In a feasible implementation manner, the position of the second candidate pixel point includes: using a position of a pixel point at an upper left vertex of the image block to be processed as an origin, and using an upper position of the image block to be processed The line where the edge is located is the horizontal axis, the right is the horizontal positive direction, the line where the left edge of the image block to be processed is located is the vertical axis, and the downward is the following coordinate points in the orthogonal coordinate system in the vertical positive direction At least one: (-1, h × i-1 + h), (-1, h × i + h), (-w × i, h × j-1), (-w × i-1, h × j-1), (-w × i, h × j), (-w × i-1, h × j), where w and h are preset positive integers, i is a positive integer, and j is a non-negative integer .
该实施方式的有益效果在于根据实际的编码需求为第二候选像素点的选择提供了多种可能性,可以达到性能、复杂度以及软硬件消耗的平衡。The beneficial effect of this implementation mode is that it provides multiple possibilities for the selection of the second candidate pixel point according to the actual coding requirements, and can achieve a balance between performance, complexity, and software and hardware consumption.
在一种可行的实施方式中,w为所述待处理图像块的宽度,h为所述待处理图像块的高度。In a feasible implementation manner, w is the width of the image block to be processed, and h is the height of the image block to be processed.
该实施方式的有益效果在于按照待处理图像块的尺寸选择第二候选像素点的位置,符合待处理图像块的本地运动特性,使选择更合理。The beneficial effect of this implementation mode is that the position of the second candidate pixel point is selected according to the size of the image block to be processed, which is consistent with the local motion characteristics of the image block to be processed, making the selection more reasonable.
在一种可行的实施方式中,运动矢量场通过对所述待处理图像块所在的图像对应的运动信息矩阵进行采样获得,w为所述运动矢量场的采样宽度间隔,h为所述运动矢量场的采样高度间隔。In a feasible implementation manner, the motion vector field is obtained by sampling a motion information matrix corresponding to an image where the image block to be processed is located, w is a sampling width interval of the motion vector field, and h is the motion vector. The sampling height interval of the field.
该实施方式的有益效果在于使第二候选像素点的位置的选择和运动矢量场的运动信息的分布保持一致,保证了位置选取的均衡性。The beneficial effect of this implementation mode is that the selection of the position of the second candidate pixel point and the distribution of the motion information of the motion vector field are kept consistent, and the balance of position selection is ensured.
在一种可行的实施方式中,w×i小于或等于第一阈值。In a feasible implementation manner, w × i is less than or equal to the first threshold.
在一种可行的实施方式中,所述第一阈值等于所述待处理图像块所在的编码树单元CTU的宽度,或者,所述第一阈值等于所述CTU的宽度的2倍。In a feasible implementation manner, the first threshold value is equal to a width of a coding tree unit CTU where the image block to be processed is located, or the first threshold value is equal to twice the width of the CTU.
该实施方式的有益效果在于限制第二候选像素点位置的选取范围,保证了编码性能与存储空间的平衡。This embodiment has the beneficial effect of limiting the selection range of the position of the second candidate pixel point, and ensuring the balance between the coding performance and the storage space.
在一种可行的实施方式中,所述第二候选像素点为多个,所述获取与待处理图像块具有预设位置关系的至少两个目标像素点,包括:按照所述预设顺序获取所述至少两个目标像素点中的多个第二候选像素点,其中,当所述在先获取的第二候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为P,当所述在后获取的第二候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为Q,P小于或等于Q。In a feasible implementation manner, there are a plurality of second candidate pixel points, and the acquiring at least two target pixel points having a preset position relationship with the image block to be processed includes: acquiring in accordance with the preset order A plurality of second candidate pixel points of the at least two target pixel points, wherein when the previously obtained second candidate pixel points correspond to the target motion information, the length of the binary representation of the target identification information Is P, when the second candidate pixel point obtained later corresponds to the target motion information, the length of the binary representation of the target identification information is Q, and P is less than or equal to Q.
在一种可行的实施方式中,所述预设顺序包括:从短到长的距离顺序,其中,所述距离为所述第二候选像素点在所述直角坐标系中的水平坐标绝对值和竖直坐标绝对值之和;或者,从右到左的顺序;或者,从上到下的顺序;或者,从右上到左下的折线型顺序。In a feasible implementation manner, the preset order includes a short-to-long distance order, where the distance is an absolute value of a horizontal coordinate of the second candidate pixel point in the rectangular coordinate system and The sum of the absolute values of the vertical coordinates; or, the order from right to left; or, the order from top to bottom; or the polyline order from top to bottom.
在一种可行的实施方式中,所述距离为连接所述第二候选像素点和所述待处理图像块的左下角顶点位置的像素点的直线线段的长度。In a feasible implementation manner, the distance is a length of a straight line segment connecting the second candidate pixel point and a pixel point at a vertex position of a lower left corner of the image block to be processed.
该实施方式的有益效果在于各第二候选像素点对应的运动信息的表示方式采用变长编码方式时,顺序靠前的运动信息会使用较短的码字编码,顺序靠后的运动信息会使用较 长的码字编码。按照第二候选像素点的运动信息与待处理图像块的运动信息的相关性,恰当地决定获取顺序有利于选择更好的码字编码策略,提高编码性能。The beneficial effect of this implementation mode is that when the variable length coding method is adopted for the representation method of the motion information corresponding to each second candidate pixel point, the motion information in the upper order will be encoded with a shorter codeword, and the motion information in the lower order will be used. Longer codeword encoding. According to the correlation between the motion information of the second candidate pixel point and the motion information of the image block to be processed, appropriately determining the acquisition order is beneficial to selecting a better codeword coding strategy and improving coding performance.
在一种可行的实施方式中,所述获取的至少两个目标像素点中,至少两个目标像素点的运动信息相同。In a feasible implementation manner, among the obtained at least two target pixel points, motion information of at least two target pixel points is the same.
该实施方式的有益效果在于在构建候选运动信息列表的时候不再进行修剪操作,节省了复杂度。The beneficial effect of this implementation mode is that the trimming operation is not performed when constructing the candidate motion information list, which saves complexity.
在一种可行的实施方式中,所述获取与待处理图像块具有预设位置关系的至少两个目标像素点,包括:依次获取与所述待处理图像块具有所述预设位置关系的候选像素点;确定当前获取的所述候选像素点的运动信息与已获取的所述目标像素点的运动信息不同;将所述具有不同运动信息的候选像素点作为所述目标像素点。In a feasible implementation manner, the acquiring at least two target pixel points having a preset position relationship with an image block to be processed includes: sequentially obtaining candidates having the preset position relationship with the image block to be processed. A pixel point; determining that the currently acquired motion information of the candidate pixel point is different from the acquired motion information of the target pixel point; and using the candidate pixel point with different motion information as the target pixel point.
该实施方式的有益效果在于通过修剪操作,去除了候选运动信息列表中的冗余信息,提高了编码效率。The beneficial effect of this implementation mode is that redundant information in the candidate motion information list is removed through a pruning operation, and encoding efficiency is improved.
在一种可行的实施方式中,所述获取的目标像素点的个数为预设的第二阈值。In a feasible implementation manner, the number of the obtained target pixel points is a preset second threshold.
该实施方式的有益效果在于限制目标像素点的获取个数,平衡了编码性能和软硬件消耗,在一些具体的实施方式中也避免了候选运动信息列表总数不确定所造成的解码系统不稳定。The beneficial effect of this implementation mode is that the number of target pixel points is acquired, encoding performance and software and hardware consumption are balanced, and in some specific implementation modes, the instability of the decoding system caused by the uncertain total number of candidate motion information lists is also avoided.
在一种可行的实施方式中,所述根据所述目标运动信息,预测所述待处理图像块的运动信息,包括:将所述目标运动信息作为所述待处理图像块的运动信息。In a feasible implementation manner, the predicting the motion information of the image block to be processed according to the target motion information includes: using the target motion information as the motion information of the image block to be processed.
在一种可行的实施方式中,所述方法用于解码所述待处理图像块,还包括:解析码流以获得目标运动残差信息;对应的,所述根据所述目标运动信息,预测所述待处理图像块的运动信息,包括:组合所述目标运动信息和所述目标运动残差信息,以获得所述待处理图像块的运动信息。In a feasible implementation manner, the method is used to decode the image block to be processed, further comprising: analyzing a code stream to obtain target motion residual information; and correspondingly, predicting the target motion information based on the target motion information. The motion information of the image block to be processed includes: combining the target motion information and the target motion residual information to obtain the motion information of the image block to be processed.
在一种可行的实施方式中,所述获取目标标识信息,包括:解析所述码流以获得所述目标标识信息。In a feasible implementation manner, the obtaining target identification information includes: parsing the code stream to obtain the target identification information.
在一种可行的实施方式中,所述方法用于编码所述待处理图像块,在所述获取目标标识信息之前,还包括:确定编码代价最小的目标运动信息和目标运动残差信息的组合;对应的,所述获取目标标识信息包括:获取所述编码代价最小的目标运动信息在所述至少两个目标运动信息中的标识信息。In a feasible implementation manner, the method is used to encode the image block to be processed, and before the obtaining the target identification information, the method further includes: determining a combination of target motion information and target motion residual information with the least coding cost. Correspondingly, the obtaining target identification information includes: obtaining identification information of the target motion information with the least coding cost among the at least two target motion information.
在一种可行的实施方式中,编码所述获取的目标标识信息。In a feasible implementation manner, the obtained target identification information is encoded.
在一种可行的实施方式中,编码所述目标运动残差信息。In a feasible implementation manner, the target motion residual information is encoded.
上述多种可行的实施方式将本申请中运动矢量预测方法,分别应用到待处理图像块运动矢量获取的解码方法和编码方法,合并预测模式(Merge)和高级运动矢量预测模式(advanced motion vector prediction,AMVP),提高了原有方法的编码性能和效率。The above various feasible embodiments apply the motion vector prediction method in the present application to a decoding method and an encoding method for obtaining motion vectors of an image block to be processed, a merge prediction mode (Merge) and an advanced motion vector prediction mode (advanced motion vector prediction). AMVP), which improves the coding performance and efficiency of the original method.
在本申请实施例的第二方面,提供了一种图像块的运动信息的预测方法,包括:确定与待处理图像块具有预设位置关系的至少一个目标像素点的可用性,所述目标像素点包括位于所述待处理图像块的左侧且与所述待处理图像块不邻接的候选像素点,其中,当所述目标像素点所在的图像块的预测模式为帧内预测时,所述目标像素点不可用;将可用的所述目标像素点对应的运动信息加入所述待处理图像块的候选运动信息集合;获取目标标识信息,所述目标标识信息用于从所述候选运动信息集合中确定目标运动信息;根据所述目标运动信息,预测所述待处理图像块的运动信息。In a second aspect of the embodiments of the present application, a method for predicting motion information of an image block is provided, including: determining availability of at least one target pixel point having a preset position relationship with an image block to be processed, the target pixel point Including candidate pixel points located on the left side of the image block to be processed and not adjacent to the image block to be processed, wherein when the prediction mode of the image block where the target pixel point is located is intra prediction, the target Pixels are unavailable; motion information corresponding to the available target pixel is added to the candidate motion information set of the image block to be processed; target identification information is obtained, and the target identification information is used from the candidate motion information set Determining target motion information; and predicting motion information of the image block to be processed according to the target motion information.
该实施方式的有益效果在于采用待处理块左侧非邻接图像块的运动信息作为待处理块的候选运动信息,利用了更多的空域先验编码信息,提高了编码性能。The beneficial effect of this implementation mode is that motion information of a non-adjacent image block on the left side of a block to be processed is used as candidate motion information of the block to be processed, and more spatially prior coding information is used to improve coding performance.
在一种可行的实施方式中,所述确定与待处理图像块具有预设位置关系的至少一个目标像素点的可用性,包括:确定所述目标像素点所在的图像块的可用性。In a feasible implementation manner, the determining the availability of at least one target pixel point having a preset position relationship with the image block to be processed includes determining the availability of the image block where the target pixel point is located.
应理解,可用性的判断会基于目标像素点所在图像块的预测模式、目标像素点是否在图像区域内、目标像素点指示的位置对应的运动矢量是否一定和其他位置对应的运动矢量相同等因素(比如H.265标准中,对于长方形分块模式,Merge模式的候选预测块的确定方式)。在一种可行的实施方式中,一般的,当所述目标像素点所在的图像块的预测模式为帧间预测时,所述目标像素点可用,但是当目标像素点位于待处理图像块所处的图像的边缘以外或者条带的边缘以外时,目标像素点也为不可用。It should be understood that the judgment of availability will be based on factors such as the prediction mode of the image block where the target pixel is located, whether the target pixel is within the image region, whether the motion vector corresponding to the position indicated by the target pixel is necessarily the same as the motion vector corresponding to other positions ( For example, in the H.265 standard, for the rectangular block mode, the candidate prediction block of the Merge mode is determined). In a feasible implementation manner, generally, when the prediction mode of the image block where the target pixel point is located is inter prediction, the target pixel point is available, but when the target pixel point is located at the image block to be processed When the image is outside the edge of the image or the edge of the strip, the target pixel is also unavailable.
在一种可行的实施方式中,所述候选像素点的位置,包括:在以所述待处理图像块的左上顶点处的像素点的位置为原点,以所述待处理图像块的上边缘所在的直线为横轴,向右为水平正方向,以所述待处理图像块的左边缘所在的直线为纵轴,向下为竖直正方向的直角坐标系中的以下坐标点中的至少一个:(-1,h×i-1+h),(-1,h×i+h),(-w×i,h×j-1),(-w×i-1,h×j-1),(-w×i,h×j),(-w×i-1,h×j),其中,w和h为预设正整数,i为正整数,j为非负整数。In a feasible implementation manner, the position of the candidate pixel point includes: taking a position of a pixel point at an upper left vertex of the image block to be processed as an origin point, and taking an upper edge of the image block to be processed to be located The straight line is the horizontal axis, the right is the horizontal positive direction, the straight line where the left edge of the image block to be processed is located is the vertical axis, and the downward is at least one of the following coordinate points in the orthogonal coordinate system in the vertical positive direction. : (-1, h × i-1 + h), (-1, h × i + h), (-w × i, h × j-1), (-w × i-1, h × j- 1), (-w × i, h × j), (-w × i-1, h × j), where w and h are preset positive integers, i is a positive integer, and j is a non-negative integer.
该实施方式的有益效果在于根据实际的编码需求为候选像素点的选择提供了多种可能性,可以达到性能、复杂度以及软硬件消耗的平衡。The beneficial effect of this implementation mode is that it provides multiple possibilities for the selection of candidate pixels according to actual coding requirements, and can achieve a balance between performance, complexity, and software and hardware consumption.
在一种可行的实施方式中,w为所述待处理图像块的宽度,h为所述待处理图像块的高度。In a feasible implementation manner, w is the width of the image block to be processed, and h is the height of the image block to be processed.
该实施方式的有益效果在于按照待处理图像块的尺寸选择候选像素点的位置,符合待处理图像块的本地运动特性,使选择更合理。The beneficial effect of this implementation mode is that the positions of candidate pixel points are selected according to the size of the image block to be processed, which conforms to the local motion characteristics of the image block to be processed, making the selection more reasonable.
在一种可行的实施方式中,运动矢量场通过对所述待处理图像块所在的图像对应的运动信息矩阵进行采样获得,w为所述运动矢量场的采样宽度间隔,h为所述运动矢量场的采样高度间隔。In a feasible implementation manner, the motion vector field is obtained by sampling a motion information matrix corresponding to an image where the image block to be processed is located, w is a sampling width interval of the motion vector field, and h is the motion vector. The sampling height interval of the field.
该实施方式的有益效果在于使候选像素点的位置的选择和运动矢量场的运动信息的分布保持一致,保证了位置选取的均衡性。The beneficial effect of this implementation mode is that the selection of the positions of the candidate pixels is consistent with the distribution of the motion information of the motion vector field, and the balance of position selection is ensured.
在一种可行的实施方式中,w×i小于或等于第一阈值。In a feasible implementation manner, w × i is less than or equal to the first threshold.
在一种可行的实施方式中,所述第一阈值等于所述待处理图像块所在的编码树单元CTU的宽度,或者,所述第一阈值等于所述CTU的宽度的2倍。In a feasible implementation manner, the first threshold value is equal to a width of a coding tree unit CTU where the image block to be processed is located, or the first threshold value is equal to twice the width of the CTU.
该实施方式的有益效果在于限制候选像素点位置的选取范围,保证了编码性能与存储空间的平衡。The beneficial effect of this implementation mode is to limit the selection range of candidate pixel positions, and ensure the balance between encoding performance and storage space.
在一种可行的实施方式中,所述候选像素点为多个,且所述多个候选像素点可用,所述将可用的所述目标像素点对应的运动信息加入所述待处理图像块的候选运动信息集合,包括:按照预设顺序将多个可用的所述候选像素点对应的运动信息加入所述待处理图像块的候选运动信息集合,其中,当所述在先获取的候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为P,当所述在后获取的候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为Q,P小于或等于Q。In a feasible implementation manner, there are multiple candidate pixels and the multiple candidate pixels are available, and the available motion information corresponding to the target pixel is added to the image block to be processed. The candidate motion information set includes: adding motion information corresponding to a plurality of available candidate pixel points to the candidate motion information set of the image block to be processed according to a preset order, wherein when the previously obtained candidate pixel points When corresponding to the target motion information, the length of the binary representation of the target identification information is P. When the candidate pixel points obtained later correspond to the target motion information, the length of the binary representation of the target identification information is Q, P is less than or equal to Q.
在一种可行的实施方式中,所述目标标识信息的二进制表示包括所述目标标识信息的编码码字。In a feasible implementation manner, the binary representation of the target identification information includes an encoded codeword of the target identification information.
在一种可行的实施方式中,所述预设顺序包括:从短到长的距离顺序,其中,所述距离为所述候选像素点在所述直角坐标系中的水平坐标绝对值和竖直坐标绝对值之和;或者,从右到左的顺序;或者,从上到下的顺序;或者,从右上到左下的折线型顺序。In a feasible implementation manner, the preset order includes a short-to-long distance order, wherein the distance is a horizontal coordinate absolute value and a vertical value of the candidate pixel point in the rectangular coordinate system. The sum of the absolute values of the coordinates; or, the order from right to left; or, the order from top to bottom; or the polyline order from top to bottom.
在一种可行的实施方式中,所述距离为连接所述第二候选像素点和所述待处理图像块的左下角顶点位置的像素点的直线线段的长度。In a feasible implementation manner, the distance is a length of a straight line segment connecting the second candidate pixel point and a pixel point at a vertex position of a lower left corner of the image block to be processed.
该实施方式的有益效果在于各候选像素点对应的运动信息的表示方式采用变长编码方式时,顺序靠前的运动信息会使用较短的码字编码,顺序靠后的运动信息会使用较长的码字编码。按照候选像素点的运动信息与待处理图像块的运动信息的相关性,恰当地决定获取顺序有利于选择更好的码字编码策略,提高编码性能。The beneficial effect of this implementation mode is that when the variable-length coding method is used for the representation of the motion information corresponding to each candidate pixel point, the motion information in the upper order will be encoded with a shorter codeword, and the motion information in the lower order will be used for a longer time. Codeword encoding. According to the correlation between the motion information of the candidate pixels and the motion information of the image block to be processed, the proper determination of the acquisition order is helpful for selecting a better codeword encoding strategy and improving the encoding performance.
在一种可行的实施方式中,候选运动信息集合包括至少两个相同的运动信息。In a feasible implementation manner, the candidate motion information set includes at least two identical motion information.
该实施方式的有益效果在于在构建候选运动信息列表的时候不再进行修剪操作,节省了复杂度。The beneficial effect of this implementation mode is that the trimming operation is not performed when constructing the candidate motion information list, which saves complexity.
在一种可行的实施方式中,所述将可用的所述目标像素点对应的运动信息加入所述待处理图像块的候选运动信息集合,包括:依次获取所述可用的目标像素点;确定当前获取的所述可用的目标像素点的运动信息与所述待处理图像块的候选运动信息集合中的运动信息不同;将所述具有不同运动信息的可用的目标像素点加入所述待处理图像块的候选运动信息集合。In a feasible implementation manner, adding the available motion information corresponding to the target pixel point to the candidate motion information set of the image block to be processed includes: sequentially obtaining the available target pixel point; determining a current The obtained motion information of the available target pixel points is different from the motion information in the candidate motion information set of the image block to be processed; adding the available target pixel points with different motion information to the image block to be processed Candidate motion information set.
该实施方式的有益效果在于通过修剪操作,去除了候选运动信息列表中的冗余信息,提高了编码效率。The beneficial effect of this implementation mode is that redundant information in the candidate motion information list is removed through a pruning operation, and encoding efficiency is improved.
在一种可行的实施方式中,所述候选运动信息集合中的运动信息的个数小于或等于预设的第二阈值。In a feasible implementation manner, the number of motion information in the candidate motion information set is less than or equal to a preset second threshold.
该实施方式的有益效果在于限制目标像素点的获取个数,平衡了编码性能和软硬件消耗,在一些具体的实施方式中也避免了候选运动信息列表总数不确定所造成的解码系统不稳定。The beneficial effect of this implementation mode is that the number of target pixel points is acquired, encoding performance and software and hardware consumption are balanced, and in some specific implementation modes, the instability of the decoding system caused by the uncertain total number of candidate motion information lists is also avoided.
在一种可行的实施方式中,所述根据所述目标运动信息,预测所述待处理图像块的运动信息,包括:将所述目标运动信息作为所述待处理图像块的运动信息。In a feasible implementation manner, the predicting the motion information of the image block to be processed according to the target motion information includes: using the target motion information as the motion information of the image block to be processed.
在一种可行的实施方式中,所述方法用于解码所述待处理图像块,还包括:解析码流以获得目标运动残差信息;对应的,所述根据所述目标运动信息,预测所述待处理图像块的运动信息,包括:组合所述目标运动信息和所述目标运动残差信息,以获得所述待处理图像块的运动信息。In a feasible implementation manner, the method is used to decode the image block to be processed, further comprising: analyzing a code stream to obtain target motion residual information; and correspondingly, predicting the target motion information based on the target motion information. The motion information of the image block to be processed includes: combining the target motion information and the target motion residual information to obtain the motion information of the image block to be processed.
在一种可行的实施方式中,所述获取目标标识信息,包括:解析所述码流以获得所述目标标识信息。In a feasible implementation manner, the obtaining target identification information includes: parsing the code stream to obtain the target identification information.
在一种可行的实施方式中,所述方法用于编码所述待处理图像块,在所述获取目标标识信息之前,还包括:确定编码代价最小的目标运动信息和目标运动残差信息的组合;对应的,所述获取目标标识信息包括:获取所述编码代价最小的目标运动信息在所述至少两个目标运动信息中的标识信息。In a feasible implementation manner, the method is used to encode the image block to be processed, and before the obtaining the target identification information, the method further includes: determining a combination of target motion information and target motion residual information with the least coding cost. Correspondingly, the obtaining target identification information includes: obtaining identification information of the target motion information with the least coding cost among the at least two target motion information.
在一种可行的实施方式中,还包括:编码所述获取的目标标识信息。In a feasible implementation manner, the method further includes: encoding the obtained target identification information.
在一种可行的实施方式中,还包括:编码所述目标运动残差信息。In a feasible implementation manner, the method further includes: encoding the target motion residual information.
上述多种可行的实施方式将本申请中运动矢量预测方法,分别应用到待处理图像块运动矢量获取的解码方法和编码方法,合并预测模式和高级运动矢量预测模式提高了原有方法的编码性能和效率。The above various feasible embodiments apply the motion vector prediction method in the present application to a decoding method and an encoding method for obtaining motion vectors of an image block to be processed, and the combined prediction mode and advanced motion vector prediction mode improve the encoding performance of the original method. And efficiency.
在本申请实施例的第三方面,提供了一种运动信息的预测装置,包括:获取模块,用于获取与待处理图像块具有预设位置关系的至少两个目标像素点,所述目标像素点包括与所述待处理图像块邻接的第一候选像素点和位于所述待处理图像块的左侧且与所述待处理图像块不邻接的第二候选像素点;索引模块,用于获取目标标识信息,所述目标标识信息用于从所述至少两个目标像素点对应的运动信息中确定目标运动信息,其中,当所述第一候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为N,当所述第二候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为M,N小于或等于M;计算模块,用于根据所述目标运动信息,预测所述待处理图像块的运动信息。In a third aspect of the embodiments of the present application, a device for predicting motion information is provided, including: an acquisition module, configured to acquire at least two target pixel points having a preset position relationship with an image block to be processed, the target pixels The points include a first candidate pixel point adjacent to the image block to be processed and a second candidate pixel point located on the left side of the image block to be processed and not adjacent to the image block to be processed; an index module, configured to obtain Target identification information for determining target motion information from motion information corresponding to the at least two target pixel points, wherein when the first candidate pixel point corresponds to the target motion information, the The length of the binary representation of the target identification information is N. When the second candidate pixel corresponds to the target motion information, the length of the binary representation of the target identification information is M, and N is less than or equal to M; the calculation module uses Based on the target motion information, predicting motion information of the image block to be processed.
在一种可行的实施方式中,所述目标标识信息的二进制表示包括所述目标标识信息的编码码字。In a feasible implementation manner, the binary representation of the target identification information includes an encoded codeword of the target identification information.
在一种可行的实施方式中,所述第二候选像素点的位置,包括:在以所述待处理图像块的左上顶点处的像素点的位置为原点,以所述待处理图像块的上边缘所在的直线为横轴,向右为水平正方向,以所述待处理图像块的左边缘所在的直线为纵轴,向下为竖直正方向的直角坐标系中的以下坐标点中的至少一个:(-1,h×i-1+h),(-1,h×i+h),(-w×i,h×j-1),(-w×i-1,h×j-1),(-w×i,h×j),(-w×i-1,h×j),其中,w和h为预设正整数,i为正整数,j为非负整数。In a feasible implementation manner, the position of the second candidate pixel point includes: using a position of a pixel point at an upper left vertex of the image block to be processed as an origin, and using an upper position of the image block to be processed The line where the edge is located is the horizontal axis, the right is the horizontal positive direction, the line where the left edge of the image block to be processed is located is the vertical axis, and the downward is the following coordinate points in the orthogonal coordinate system in the vertical positive direction At least one: (-1, h × i-1 + h), (-1, h × i + h), (-w × i, h × j-1), (-w × i-1, h × j-1), (-w × i, h × j), (-w × i-1, h × j), where w and h are preset positive integers, i is a positive integer, and j is a non-negative integer .
在一种可行的实施方式中,w为所述待处理图像块的宽度,h为所述待处理图像块的高度。In a feasible implementation manner, w is the width of the image block to be processed, and h is the height of the image block to be processed.
在一种可行的实施方式中,运动矢量场通过对所述待处理图像块所在的图像对应的运动信息矩阵进行采样获得,w为所述运动矢量场的采样宽度间隔,h为所述运动矢量场的采样高度间隔。In a feasible implementation manner, the motion vector field is obtained by sampling a motion information matrix corresponding to an image where the image block to be processed is located, w is a sampling width interval of the motion vector field, and h is the motion vector. The sampling height interval of the field.
在一种可行的实施方式中,w×i小于或等于第一阈值。In a feasible implementation manner, w × i is less than or equal to the first threshold.
在一种可行的实施方式中,所述第一阈值等于所述待处理图像块所在的编码树单元CTU的宽度,或者,所述第一阈值等于所述CTU的宽度的2倍。In a feasible implementation manner, the first threshold value is equal to a width of a coding tree unit CTU where the image block to be processed is located, or the first threshold value is equal to twice the width of the CTU.
在一种可行的实施方式中,所述第二候选像素点为多个,所述获取模块具体用于:按照所述预设顺序获取所述至少两个目标像素点中的多个第二候选像素点,其中,当所述在先获取的第二候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为P,当所述在后获取的第二候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为Q,P小于或等于Q;In a feasible implementation manner, there are multiple second candidate pixels, and the obtaining module is specifically configured to obtain multiple second candidates among the at least two target pixels in the preset order. A pixel, wherein when the previously obtained second candidate pixel corresponds to the target motion information, the length of the binary representation of the target identification information is P, and when the later obtained second candidate pixel When corresponding to the target motion information, the length of the binary representation of the target identification information is Q, and P is less than or equal to Q;
在一种可行的实施方式中,所述预设顺序包括:从短到长的距离顺序,其中,所述距离为所述第二候选像素点在所述直角坐标系中的水平坐标绝对值和竖直坐标绝对值之和;或者,从右到左的顺序;或者,从上到下的顺序;或者,从右上到左下的折线型顺序。In a feasible implementation manner, the preset order includes a short-to-long distance order, where the distance is an absolute value of a horizontal coordinate of the second candidate pixel point in the rectangular coordinate system and The sum of the absolute values of the vertical coordinates; or, the order from right to left; or, the order from top to bottom; or the polyline order from top to bottom.
在一种可行的实施方式中,所述距离为连接所述第二候选像素点和所述待处理图像块的左下角顶点位置的像素点的直线线段的长度。In a feasible implementation manner, the distance is a length of a straight line segment connecting the second candidate pixel point and a pixel point at a vertex position of a lower left corner of the image block to be processed.
在一种可行的实施方式中,所述获取的至少两个目标像素点中,至少两个目标像素点的运动信息相同。In a feasible implementation manner, among the obtained at least two target pixel points, motion information of at least two target pixel points is the same.
在一种可行的实施方式中,所述获取模块具体用于:依次获取与所述待处理图像块具有所述预设位置关系的候选像素点;确定当前获取的所述候选像素点的运动信息与已获取的所述目标像素点的运动信息不同;将所述具有不同运动信息的候选像素点作为所述目标像素点。In a feasible implementation manner, the obtaining module is specifically configured to sequentially obtain candidate pixel points having the preset positional relationship with the image block to be processed; and determine the currently acquired motion information of the candidate pixel points. Different from the obtained motion information of the target pixel point; the candidate pixel point having different motion information is used as the target pixel point.
在一种可行的实施方式中,所述获取的目标像素点的个数为预设的第二阈值。In a feasible implementation manner, the number of the obtained target pixel points is a preset second threshold.
在一种可行的实施方式中,所述计算模块具体用于:将所述目标运动信息作为所述待处理图像块的运动信息。In a feasible implementation manner, the calculation module is specifically configured to use the target motion information as the motion information of the image block to be processed.
在一种可行的实施方式中,所述装置用于解码所述待处理图像块,所述索引模块还用于:解析码流以获得目标运动残差信息;对应的,所述计算模块具体用于:组合所述目标运动信息和所述目标运动残差信息,以获得所述待处理图像块的运动信息。In a feasible implementation manner, the device is configured to decode the image block to be processed, and the indexing module is further configured to: parse a code stream to obtain target motion residual information; correspondingly, the calculation module is specifically used It is: combining the target motion information and the target motion residual information to obtain motion information of the image block to be processed.
在一种可行的实施方式中,所述索引模块具体用于:解析所述码流以获得所述目标标识信息。In a feasible implementation manner, the indexing module is specifically configured to: parse the code stream to obtain the target identification information.
在一种可行的实施方式中,所述装置用于编码所述待处理图像块,所述获取模块还用于:确定编码代价最小的目标运动信息和目标运动残差信息的组合;对应的,所述索引模块具体用于:获取所述编码代价最小的目标运动信息在所述至少两个目标运动信息中的标识信息。In a feasible implementation manner, the device is configured to encode the image block to be processed, and the obtaining module is further configured to determine a combination of target motion information and target motion residual information with the least coding cost; correspondingly, The indexing module is specifically configured to obtain identification information of the target motion information with the least coding cost among the at least two target motion information.
在一种可行的实施方式中,所述索引模块还用于:编码所述获取的目标标识信息。In a feasible implementation manner, the indexing module is further configured to: encode the obtained target identification information.
在一种可行的实施方式中,所述索引模块还用于:编码所述目标运动残差信息。In a feasible implementation manner, the indexing module is further configured to: encode the target motion residual information.
在本申请实施例的第四方面,提供了一种运动信息的预测装置,包括:检测模块,用于确定与待处理图像块具有预设位置关系的至少一个目标像素点的可用性,所述目标像素点包括位于所述待处理图像块的左侧且与所述待处理图像块不邻接的候选像素点,其中,当所述目标像素点所在的图像块的预测模式为帧内预测时,所述目标像素点不可用;获取模块,用于将可用的所述目标像素点对应的运动信息加入所述待处理图像块的候选运动信息集合;索引模块,用于获取目标标识信息,所述目标标识信息用于从所述候选运动信息集合中确定目标运动信息;计算模块,用于根据所述目标运动信息,预测所述待处理图像块的运动信息。In a fourth aspect of the embodiments of the present application, a device for predicting motion information is provided, including: a detection module configured to determine availability of at least one target pixel point having a preset position relationship with an image block to be processed, the target The pixel points include candidate pixel points that are located on the left side of the image block to be processed and are not adjacent to the image block to be processed. When the prediction mode of the image block where the target pixel point is located is intra prediction, all pixels The target pixel is unavailable; an acquisition module is configured to add the available motion information corresponding to the target pixel to the candidate motion information set of the image block to be processed; an index module is configured to acquire target identification information, the target The identification information is used to determine target motion information from the candidate motion information set; a calculation module is configured to predict motion information of the image block to be processed according to the target motion information.
在一种可行的实施方式中,所述检测模块具体用于:确定所述目标像素点所在的图像块的可用性。In a feasible implementation manner, the detection module is specifically configured to determine availability of an image block where the target pixel point is located.
在一种可行的实施方式中,所述候选像素点的位置,包括:在以所述待处理图像块的左上顶点处的像素点的位置为原点,以所述待处理图像块的上边缘所在的直线为横轴,向右为水平正方向,以所述待处理图像块的左边缘所在的直线为纵轴,向下为竖直正方向的直角坐标系中的以下坐标点中的至少一个:(-1,h×i-1+h),(-1,h×i+h),(-w×i,h×j-1),(-w×i-1,h×j-1),(-w×i,h×j),(-w×i-1,h×j),其中,w和h为预设正整数,i为正整数,j为非负整数。In a feasible implementation manner, the position of the candidate pixel point includes: taking a position of a pixel point at an upper left vertex of the image block to be processed as an origin point, and a position where an upper edge of the image block to be processed is located The straight line is the horizontal axis, the right is the horizontal positive direction, the straight line where the left edge of the image block to be processed is located is the vertical axis, and the downward is at least one of the following coordinate points in the orthogonal coordinate system in the vertical positive direction : (-1, h × i-1 + h), (-1, h × i + h), (-w × i, h × j-1), (-w × i-1, h × j- 1), (-w × i, h × j), (-w × i-1, h × j), where w and h are preset positive integers, i is a positive integer, and j is a non-negative integer.
在一种可行的实施方式中,w为所述待处理图像块的宽度,h为所述待处理图像块的高度。In a feasible implementation manner, w is the width of the image block to be processed, and h is the height of the image block to be processed.
在一种可行的实施方式中,运动矢量场通过对所述待处理图像块所在的图像对应的运动信息矩阵进行采样获得,w为所述运动矢量场的采样宽度间隔,h为所述运动矢量场的采样高度间隔。In a feasible implementation manner, the motion vector field is obtained by sampling a motion information matrix corresponding to an image where the image block to be processed is located, w is a sampling width interval of the motion vector field, and h is the motion vector. The sampling height interval of the field.
在一种可行的实施方式中,w×i小于或等于第一阈值。In a feasible implementation manner, w × i is less than or equal to the first threshold.
在一种可行的实施方式中,所述第一阈值等于所述待处理图像块所在的编码树单元CTU的宽度,或者,所述第一阈值等于所述CTU的宽度的2倍。In a feasible implementation manner, the first threshold value is equal to a width of a coding tree unit CTU where the image block to be processed is located, or the first threshold value is equal to twice the width of the CTU.
在一种可行的实施方式中,所述候选像素点为多个,且所述多个候选像素点可用,所述获取模块具体用于:按照预设顺序将多个可用的所述候选像素点对应的运动信息加入所述待处理图像块的候选运动信息集合,其中,当所述在先获取的候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为P,当所述在后获取的候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为Q,P小于或等于Q。In a feasible implementation manner, there are a plurality of candidate pixel points, and the plurality of candidate pixel points are available, and the obtaining module is specifically configured to: according to a preset order, use the plurality of candidate pixel points that are available. The corresponding motion information is added to the candidate motion information set of the image block to be processed, wherein when the previously obtained candidate pixel points correspond to the target motion information, the length of the binary representation of the target identification information is P, When the candidate pixel points obtained later correspond to the target motion information, the length of the binary representation of the target identification information is Q, and P is less than or equal to Q.
在一种可行的实施方式中,所述目标标识信息的二进制表示包括所述目标标识信息的编码码字。In a feasible implementation manner, the binary representation of the target identification information includes an encoded codeword of the target identification information.
在一种可行的实施方式中,所述预设顺序包括:从短到长的距离顺序,其中,所述距离为第二候选像素点在所述直角坐标系中的水平坐标绝对值和竖直坐标绝对值之和;或者,从右到左的顺序;或者,从上到下的顺序;或者,从右上到左下的折线型顺序。In a feasible implementation manner, the preset order includes a short-to-long distance order, where the distance is a horizontal coordinate absolute value and a vertical value of a second candidate pixel point in the rectangular coordinate system. The sum of the absolute values of the coordinates; or, the order from right to left; or, the order from top to bottom; or the polyline order from top to bottom.
在一种可行的实施方式中,所述距离为连接所述第二候选像素点和所述待处理图像块的左下角顶点位置的像素点的直线线段的长度。In a feasible implementation manner, the distance is a length of a straight line segment connecting the second candidate pixel point and a pixel point at a vertex position of a lower left corner of the image block to be processed.
在一种可行的实施方式中,候选运动信息集合包括至少两个相同的运动信息。In a feasible implementation manner, the candidate motion information set includes at least two identical motion information.
在一种可行的实施方式中,所述获取模块具体用于:依次获取所述可用的目标像素点;确定当前获取的所述可用的目标像素点的运动信息与所述待处理图像块的候选运动信息集合中的运动信息不同;将所述具有不同运动信息的可用的目标像素点加入所述待处理图像块的候选运动信息集合。In a feasible implementation manner, the acquiring module is specifically configured to: sequentially acquire the available target pixel points; determine the currently acquired motion information of the available target pixel points and candidates for the image block to be processed The motion information in the motion information set is different; the available target pixels with different motion information are added to the candidate motion information set of the image block to be processed.
在一种可行的实施方式中,所述候选运动信息集合中的运动信息的个数小于或等于预设的第二阈值。In a feasible implementation manner, the number of motion information in the candidate motion information set is less than or equal to a preset second threshold.
在一种可行的实施方式中,所述计算模块具体用于:将所述目标运动信息作为所述待处理图像块的运动信息。In a feasible implementation manner, the calculation module is specifically configured to use the target motion information as the motion information of the image block to be processed.
在一种可行的实施方式中,所述装置用于解码所述待处理图像块,所述索引模块还用于:解析码流以获得目标运动残差信息;对应的,所述计算模块具体用于:组合所述目标运动信息和所述目标运动残差信息,以获得所述待处理图像块的运动信息。In a feasible implementation manner, the device is configured to decode the image block to be processed, and the indexing module is further configured to: parse a code stream to obtain target motion residual information; correspondingly, the calculation module is specifically used It is: combining the target motion information and the target motion residual information to obtain motion information of the image block to be processed.
在一种可行的实施方式中,所述索引模块具体用于:解析所述码流以获得所述目标标识信息。In a feasible implementation manner, the indexing module is specifically configured to: parse the code stream to obtain the target identification information.
在一种可行的实施方式中,所述装置用于编码所述待处理图像块,所述获取模块还用于:确定编码代价最小的目标运动信息和目标运动残差信息的组合;对应的,所述索引模块具体用于:获取所述编码代价最小的目标运动信息在所述至少两个目标运动信息中的标识信息。In a feasible implementation manner, the device is configured to encode the image block to be processed, and the obtaining module is further configured to determine a combination of target motion information and target motion residual information with the least coding cost; correspondingly, The indexing module is specifically configured to obtain identification information of the target motion information with the least coding cost among the at least two target motion information.
在一种可行的实施方式中,所述索引模块还用于:编码所述获取的目标标识信息。In a feasible implementation manner, the indexing module is further configured to: encode the obtained target identification information.
在一种可行的实施方式中,所述索引模块还用于:编码所述目标运动残差信息。In a feasible implementation manner, the indexing module is further configured to: encode the target motion residual information.
在本申请实施例的第五方面,提供了提供了一种运动信息的预测设备,包括:处理器和耦合于所述处理器的存储器;所述处理器用于执行上述第一方面或第二方面所述的方法。In a fifth aspect of the embodiments of the present application, there is provided a prediction device that provides motion information, including: a processor and a memory coupled to the processor; and the processor is configured to execute the first or second aspect described above. The method described.
在本申请实施例的第六方面,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令在计算机上运行时,使得计算机执行上述第一方面或第二方面所述的方法。In a sixth aspect of the embodiments of the present application, a computer-readable storage medium is provided, where the computer-readable storage medium stores instructions, and when the instructions are run on a computer, the computer is caused to execute the foregoing first aspect or The method described in the second aspect.
在本申请实施例的第七方面,提供了一种包含指令的计算机程序产品,当所述指令在 计算机上运行时,使得计算机执行上述第一方面或第二方面所述的方法。In a seventh aspect of the embodiments of the present application, a computer program product containing instructions is provided, and when the instructions are run on a computer, the computer is caused to execute the method described in the first aspect or the second aspect above.
应理解,本申请的第三至七方面与本申请的第一方面或第二方面的技术方案一致,各方面及对应的可实施的设计方式所取得的有益效果相似,不再赘述。It should be understood that the third to seventh aspects of this application are consistent with the technical solutions of the first aspect or the second aspect of this application, and the beneficial effects obtained by each aspect and the corresponding implementable design manner are similar, and will not be described again.
图1为本申请实施例中视频编码及解码系统的一种示意性框图;FIG. 1 is a schematic block diagram of a video encoding and decoding system according to an embodiment of the present application;
图2为本申请实施例中视频编码器的一种示意性框图;2 is a schematic block diagram of a video encoder according to an embodiment of the present application;
图3为本申请实施例中视频解码器的一种示意性框图;3 is a schematic block diagram of a video decoder according to an embodiment of the present application;
图4为本申请实施例中帧间预测模块的一种示意性框图;4 is a schematic block diagram of an inter prediction module according to an embodiment of the present application;
图5为本申请实施例中合并预测模式的一种示例性流程图;5 is an exemplary flowchart of a merge prediction mode according to an embodiment of the present application;
图6为本申请实施例中高级运动矢量预测模式的一种示例性流程图;FIG. 6 is an exemplary flowchart of an advanced motion vector prediction mode according to an embodiment of the present application; FIG.
图7为本申请实施例中由视频解码器执行的运动补偿的一种示例性流程图;7 is an exemplary flowchart of motion compensation performed by a video decoder in an embodiment of the present application;
图8为本申请实施例中编码单元及与其关联的相邻位置图像块的一种示例性示意图;8 is an exemplary schematic diagram of a coding unit and an adjacent position image block associated with the coding unit in the embodiment of the present application;
图9为本申请实施例中构建候选预测运动矢量列表的一种示例性流程图;FIG. 9 is an exemplary flowchart of constructing a candidate prediction motion vector list in an embodiment of the present application; FIG.
图10为本申请实施例中将经过组合的候选运动矢量添加到合并模式候选预测运动矢量列表的一种示例性示意图;10 is an exemplary schematic diagram of adding a combined candidate motion vector to a merge mode candidate prediction motion vector list in an embodiment of the present application;
图11为本申请实施例中将经过缩放的候选运动矢量添加到合并模式候选预测运动矢量列表的一种示例性示意图;11 is an exemplary schematic diagram of adding a scaled candidate motion vector to a merge mode candidate prediction motion vector list in an embodiment of the present application;
图12为本申请实施例中将零运动矢量添加到合并模式候选预测运动矢量列表的一种示例性示意图;12 is an exemplary schematic diagram of adding a zero motion vector to a merge mode candidate prediction motion vector list in an embodiment of the present application;
图13为本申请实施例中编码单元及与其关联的相邻位置图像块的另一种示例性示意图;13 is another exemplary schematic diagram of a coding unit and an adjacent position image block associated with the coding unit in the embodiment of the present application;
图14为本申请实施例中运动信息预测方法的一种示例性流程图;FIG. 14 is an exemplary flowchart of a method for predicting motion information according to an embodiment of the present application; FIG.
图15为本申请实施例中待处理图像块与其关联的相邻位置图像块的一种示例性示意图;15 is an exemplary schematic diagram of an image block to be processed and an image block of an adjacent position associated with the image block to be processed in the embodiment of the present application;
图16为本申请实施例中从右到左的获取顺序的一种示例性示意图;FIG. 16 is an exemplary schematic diagram of an acquisition sequence from right to left in an embodiment of the present application; FIG.
图17为本申请实施例中从上到下的获取顺序的一种示例性示意图;FIG. 17 is an exemplary schematic diagram of an acquisition sequence from top to bottom in an embodiment of the present application; FIG.
图18为本申请实施例中从右上到左下的获取顺序的一种示例性示意图;FIG. 18 is an exemplary schematic diagram of an obtaining sequence from the upper right to the lower left in the embodiment of the present application; FIG.
图19-图30为本申请实施例中不同的获取顺序的示例性示意图;19 to 30 are exemplary schematic diagrams of different acquisition sequences in the embodiment of the present application;
图31为本申请实施例中运动信息预测方法的另一种示例性流程图;FIG. 31 is another exemplary flowchart of a motion information prediction method according to an embodiment of the present application; FIG.
图32为本申请实施例中运动信息预测装置的一种示例性结构框图;FIG. 32 is a block diagram of an exemplary structure of a motion information prediction apparatus according to an embodiment of the present application; FIG.
图33为本申请实施例中运动信息预测装置的另一种示例性结构框图;FIG. 33 is another exemplary structural block diagram of a motion information prediction apparatus according to an embodiment of the present application; FIG.
图34为本申请实施例中的运动信息预测设备的一种示意性结构框图。FIG. 34 is a schematic structural block diagram of a motion information prediction device in an embodiment of the present application.
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
图1为本申请实施例中所描述的一种实例的视频译码系统1的框图。如本文所使用,术语“视频译码器”一般是指视频编码器和视频解码器两者。在本申请中,术语“视频译码” 或“译码”可一般地指代视频编码或视频解码。视频译码系统1的视频编码器100和视频解码器200用于根据本申请提出的多种新的帧间预测模式中的任一种所描述的各种方法实例来预测当前经译码图像块或其子块的运动信息,例如运动矢量,使得预测出的运动矢量最大程度上接近使用运动估算方法得到的运动矢量,从而编码时无需传送运动矢量差值,从而进一步的改善编解码性能。FIG. 1 is a block diagram of a video decoding system 1 according to an example described in the embodiment of the present application. As used herein, the term "video coder" generally refers to both video encoders and video decoders. In this application, the terms "video coding" or "coding" may generally refer to video encoding or video decoding. The video encoder 100 and the video decoder 200 of the video decoding system 1 are configured to predict a current coded image block according to various method examples described in any of a variety of new inter prediction modes proposed in the present application. The motion information of the sub-block or its sub-blocks, such as the motion vector, makes the predicted motion vector close to the motion vector obtained using the motion estimation method to the greatest extent, so that the motion vector difference is not transmitted during encoding, thereby further improving the encoding and decoding performance.
如图1中所示,视频译码系统1包含源装置10和目的地装置20。源装置10产生经编码视频数据。因此,源装置10可被称为视频编码装置。目的地装置20可对由源装置10所产生的经编码的视频数据进行解码。因此,目的地装置20可被称为视频解码装置。源装置10、目的地装置20或两个的各种实施方案可包含一或多个处理器以及耦合到所述一或多个处理器的存储器。所述存储器可包含但不限于RAM、ROM、EEPROM、快闪存储器或可用于以可由计算机存取的指令或数据结构的形式存储所要的程序代码的任何其它媒体,如本文所描述。As shown in FIG. 1, the video decoding system 1 includes a source device 10 and a destination device 20. The source device 10 generates encoded video data. Therefore, the source device 10 may be referred to as a video encoding device. The destination device 20 may decode the encoded video data generated by the source device 10. Therefore, the destination device 20 may be referred to as a video decoding device. Various implementations of the source device 10, the destination device 20, or both may include one or more processors and a memory coupled to the one or more processors. The memory may include, but is not limited to, RAM, ROM, EEPROM, flash memory, or any other media that can be used to store the desired program code in the form of instructions or data structures accessible by a computer, as described herein.
源装置10和目的地装置20可以包括各种装置,包含桌上型计算机、移动计算装置、笔记型(例如,膝上型)计算机、平板计算机、机顶盒、例如所谓的“智能”电话等电话手持机、电视机、相机、显示装置、数字媒体播放器、视频游戏控制台、车载计算机或其类似者。The source device 10 and the destination device 20 may include various devices including desktop computers, mobile computing devices, notebook (e.g., laptop) computers, tablet computers, set-top boxes, telephone handsets, such as so-called "smart" phones, etc. Cameras, televisions, cameras, display devices, digital media players, video game consoles, on-board computers, or the like.
目的地装置20可经由链路30从源装置10接收经编码视频数据。链路30可包括能够将经编码视频数据从源装置10移动到目的地装置20的一或多个媒体或装置。在一个实例中,链路30可包括使得源装置10能够实时将经编码视频数据直接发射到目的地装置20的一或多个通信媒体。在此实例中,源装置10可根据通信标准(例如无线通信协议)来调制经编码视频数据,且可将经调制的视频数据发射到目的地装置20。所述一或多个通信媒体可包含无线和/或有线通信媒体,例如射频(RF)频谱或一或多个物理传输线。所述一或多个通信媒体可形成基于分组的网络的一部分,基于分组的网络例如为局域网、广域网或全球网络(例如,因特网)。所述一或多个通信媒体可包含路由器、交换器、基站或促进从源装置10到目的地装置20的通信的其它设备。The destination device 20 may receive the encoded video data from the source device 10 via the link 30. The link 30 may include one or more media or devices capable of moving the encoded video data from the source device 10 to the destination device 20. In one example, the link 30 may include one or more communication media enabling the source device 10 to directly transmit the encoded video data to the destination device 20 in real time. In this example, the source device 10 may modulate the encoded video data according to a communication standard, such as a wireless communication protocol, and may transmit the modulated video data to the destination device 20. The one or more communication media may include wireless and / or wired communication media, such as a radio frequency (RF) spectrum or one or more physical transmission lines. The one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (eg, the Internet). The one or more communication media may include a router, a switch, a base station, or other devices that facilitate communication from the source device 10 to the destination device 20.
在另一实例中,可将经编码数据从输出接口140输出到存储装置40。类似地,可通过输入接口240从存储装置40存取经编码数据。存储装置40可包含多种分布式或本地存取的数据存储媒体中的任一者,例如硬盘驱动器、蓝光光盘、DVD、CD-ROM、快闪存储器、易失性或非易失性存储器,或用于存储经编码视频数据的任何其它合适的数字存储媒体。In another example, the encoded data may be output from the output interface 140 to the storage device 40. Similarly, the encoded data can be accessed from the storage device 40 through the input interface 240. The storage device 40 may include any of a variety of distributed or locally-accessed data storage media, such as a hard drive, Blu-ray disc, DVD, CD-ROM, flash memory, volatile or non-volatile memory, Or any other suitable digital storage medium for storing encoded video data.
在另一实例中,存储装置40可对应于文件服务器或可保持由源装置10产生的经编码视频的另一中间存储装置。目的地装置20可经由流式传输或下载从存储装置40存取所存储的视频数据。文件服务器可为任何类型的能够存储经编码的视频数据并且将经编码的视频数据发射到目的地装置20的服务器。实例文件服务器包含网络服务器(例如,用于网站)、FTP服务器、网络附接式存储(NAS)装置或本地磁盘驱动器。目的地装置20可通过任何标准数据连接(包含因特网连接)来存取经编码视频数据。这可包含无线信道(例如,Wi-Fi连接)、有线连接(例如,DSL、电缆调制解调器等),或适合于存取存储在文件服务器上的经编码视频数据的两者的组合。经编码视频数据从存储装置40的传输可为流式传输、下载传输或两者的组合。In another example, the storage device 40 may correspond to a file server or another intermediate storage device that may hold the encoded video produced by the source device 10. The destination device 20 may access the stored video data from the storage device 40 via streaming or download. The file server may be any type of server capable of storing encoded video data and transmitting the encoded video data to the destination device 20. Example file servers include a web server (eg, for a website), an FTP server, a network attached storage (NAS) device, or a local disk drive. The destination device 20 can access the encoded video data through any standard data connection, including an Internet connection. This may include a wireless channel (e.g., Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both suitable for accessing encoded video data stored on a file server. The transmission of the encoded video data from the storage device 40 may be a streaming transmission, a download transmission, or a combination of the two.
本申请的运动矢量预测技术可应用于视频编解码以支持多种多媒体应用,例如空中电视广播、有线电视发射、卫星电视发射、串流视频发射(例如,经由因特网)、用于存储于数据存储媒体上的视频数据的编码、存储在数据存储媒体上的视频数据的解码,或其它应 用。在一些实例中,视频译码系统1可用于支持单向或双向视频传输以支持例如视频流式传输、视频回放、视频广播和/或视频电话等应用。The motion vector prediction technology of the present application can be applied to video codecs to support a variety of multimedia applications, such as over-the-air television broadcasting, cable television transmission, satellite television transmission, streaming video transmission (e.g., via the Internet), for storage in data storage Encoding of video data on media, decoding of video data stored on data storage media, or other applications. In some examples, the video coding system 1 may be used to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and / or video telephony.
图1中所说明的视频译码系统1仅为实例,并且本申请的技术可适用于未必包含编码装置与解码装置之间的任何数据通信的视频译码设置(例如,视频编码或视频解码)。在其它实例中,数据从本地存储器检索、在网络上流式传输等等。视频编码装置可对数据进行编码并且将数据存储到存储器,和/或视频解码装置可从存储器检索数据并且对数据进行解码。在许多实例中,由并不彼此通信而是仅编码数据到存储器和/或从存储器检索数据且解码数据的装置执行编码和解码。The video decoding system 1 illustrated in FIG. 1 is merely an example, and the techniques of the present application can be applied to a video decoding setting (for example, video encoding or video decoding) that does not necessarily include any data communication between the encoding device and the decoding device. . In other examples, data is retrieved from local storage, streamed over a network, and so on. The video encoding device may encode the data and store the data to a memory, and / or the video decoding device may retrieve the data from the memory and decode the data. In many instances, encoding and decoding are performed by devices that do not communicate with each other, but only encode data to and / or retrieve data from memory and decode data.
在图1的实例中,源装置10包含视频源120、视频编码器100和输出接口140。在一些实例中,输出接口140可包含调节器/解调器(调制解调器)和/或发射器。视频源120可包括视频捕获装置(例如,摄像机)、含有先前捕获的视频数据的视频存档、用以从视频内容提供者接收视频数据的视频馈入接口,和/或用于产生视频数据的计算机图形系统,或视频数据的此些来源的组合。In the example of FIG. 1, the source device 10 includes a video source 120, a video encoder 100, and an output interface 140. In some examples, the output interface 140 may include a regulator / demodulator (modem) and / or a transmitter. Video source 120 may include a video capture device (e.g., a video camera), a video archive containing previously captured video data, a video feed interface to receive video data from a video content provider, and / or a computer for generating video data Graphics systems, or a combination of these sources of video data.
视频编码器100可对来自视频源120的视频数据进行编码。在一些实例中,源装置10经由输出接口140将经编码视频数据直接发射到目的地装置20。在其它实例中,经编码视频数据还可存储到存储装置40上,供目的地装置20以后存取来用于解码和/或播放。The video encoder 100 may encode video data from the video source 120. In some examples, the source device 10 transmits the encoded video data directly to the destination device 20 via the output interface 140. In other examples, the encoded video data may also be stored on the storage device 40 for later access by the destination device 20 for decoding and / or playback.
在图1的实例中,目的地装置20包含输入接口240、视频解码器200和显示装置220。在一些实例中,输入接口240包含接收器和/或调制解调器。输入接口240可经由链路30和/或从存储装置40接收经编码视频数据。显示装置220可与目的地装置20集成或可在目的地装置20外部。一般来说,显示装置220显示经解码视频数据。显示装置220可包括多种显示装置,例如,液晶显示器(LCD)、等离子显示器、有机发光二极管(OLED)显示器或其它类型的显示装置。In the example of FIG. 1, the destination device 20 includes an input interface 240, a video decoder 200, and a display device 220. In some examples, the input interface 240 includes a receiver and / or a modem. The input interface 240 may receive the encoded video data via the link 30 and / or from the storage device 40. The display device 220 may be integrated with the destination device 20 or may be external to the destination device 20. Generally, the display device 220 displays decoded video data. The display device 220 may include various display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or other types of display devices.
尽管图1中未图示,但在一些方面,视频编码器100和视频解码器200可各自与音频编码器和解码器集成,且可包含适当的多路复用器-多路分用器单元或其它硬件和软件,以处置共同数据流或单独数据流中的音频和视频两者的编码。在一些实例中,如果适用的话,那么MUX-DEMUX单元可符合ITU H.223多路复用器协议,或例如用户数据报协议(UDP)等其它协议。Although not illustrated in FIG. 1, in some aspects, video encoder 100 and video decoder 200 may each be integrated with an audio encoder and decoder, and may include an appropriate multiplexer-demultiplexer unit Or other hardware and software to handle encoding of both audio and video in a common or separate data stream. In some examples, the MUX-DEMUX unit may conform to the ITU H.223 multiplexer protocol, or other protocols such as the User Datagram Protocol (UDP), if applicable.
视频编码器100和视频解码器200各自可实施为例如以下各项的多种电路中的任一者:一或多个微处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)、离散逻辑、硬件或其任何组合。如果部分地以软件来实施本申请,那么装置可将用于软件的指令存储在合适的非易失性计算机可读存储媒体中,且可使用一或多个处理器在硬件中执行所述指令从而实施本申请技术。前述内容(包含硬件、软件、硬件与软件的组合等)中的任一者可被视为一或多个处理器。视频编码器100和视频解码器200中的每一者可包含在一或多个编码器或解码器中,所述编码器或解码器中的任一者可集成为相应装置中的组合编码器/解码器(编码解码器)的一部分。 Video encoder 100 and video decoder 200 may each be implemented as any of a variety of circuits such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), Field Programmable Gate Array (FPGA), discrete logic, hardware, or any combination thereof. If the present application is implemented partially in software, the device may store instructions for the software in a suitable non-volatile computer-readable storage medium and may use one or more processors to execute the instructions in hardware Thus implementing the technology of the present application. Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) may be considered as one or more processors. Each of video encoder 100 and video decoder 200 may be included in one or more encoders or decoders, any of which may be integrated as a combined encoder in a corresponding device / Decoder (codec).
本申请可大体上将视频编码器100称为将某些信息“发信号通知”或“发射”到例如视频解码器200的另一装置。术语“发信号通知”或“发射”可大体上指代用以对经压缩视频数据进行解码的语法元素和/或其它数据的传送。此传送可实时或几乎实时地发生。替代地,此通信可经过一段时间后发生,例如可在编码时在经编码码流中将语法元素存储 到计算机可读存储媒体时发生,解码装置接着可在所述语法元素存储到此媒体之后的任何时间检索所述语法元素。This application may generally refer to video encoder 100 as "signaling" or "transmitting" certain information to another device, such as video decoder 200. The terms "signaling" or "transmitting" may generally refer to the transmission of syntax elements and / or other data to decode the compressed video data. This transfer can occur in real time or almost real time. Alternatively, this communication may occur over a period of time, such as when a syntax element is stored in a coded stream to a computer-readable storage medium at the time of encoding, and the decoding device may then store the syntax element after the syntax element is stored on this medium. Retrieve the syntax element at any time.
JCT-VC开发了H.265(HEVC)标准。HEVC标准化基于称作HEVC测试模型(HM)的视频解码装置的演进模型。H.265的最新标准文档可从http://www.itu.int/rec/T-REC-H.265获得,最新版本的标准文档为H.265(12/16),该标准文档以全文引用的方式并入本文中。HM假设视频解码装置相对于ITU-TH.264/AVC的现有算法具有若干额外能力。例如,H.264提供9种帧内预测编码模式,而HM可提供多达35种帧内预测编码模式。JCT-VC has developed the H.265 (HEVC) standard. The HEVC standardization is based on an evolution model of a video decoding device called a HEVC test model (HM). The latest standard document of H.265 can be obtained from http://www.itu.int/rec/T-REC-H.265. The latest version of the standard document is H.265 (12/16). The standard document is in full text. The citation is incorporated herein. HM assumes that video decoding devices have several additional capabilities over existing algorithms of ITU-TH.264 / AVC. For example, H.264 provides 9 intra-prediction encoding modes, while HM provides up to 35 intra-prediction encoding modes.
JVET致力于开发H.266标准。H.266标准化的过程基于称作H.266测试模型的视频解码装置的演进模型。H.266的算法描述可从http://phenix.int-evry.fr/jvet获得,其中最新的算法描述包含于JVET-F1001-v2中,该算法描述文档以全文引用的方式并入本文中。同时,可从https://jvet.hhi.fraunhofer.de/svn/svn_HMJEMSoftware/获得JEM测试模型的参考软件,同样以全文引用的方式并入本文中。JVET is committed to developing the H.266 standard. The process of H.266 standardization is based on the evolution model of the video decoding device called the H.266 test model. The algorithm description of H.266 can be obtained from http://phenix.int-evry.fr/jvet. The latest algorithm description is included in JVET-F1001-v2. The algorithm description document is incorporated herein by reference in its entirety. . At the same time, reference software for the JEM test model can be obtained from https://jvet.hhi.fraunhofer.de/svn/svn_HMJEMSoftware/, which is also incorporated herein by reference in its entirety.
一般来说,HM的工作模型描述可将视频帧或图像划分成包含亮度及色度样本两者的树块或最大编码单元(largest coding unit,LCU)的序列,LCU也被称为CTU。树块具有与H.264标准的宏块类似的目的。条带包含按解码次序的数个连续树块。可将视频帧或图像分割成一个或多个条带。可根据四叉树将每一树块分裂成编码单元。例如,可将作为四叉树的根节点的树块分裂成四个子节点,且每一子节点可又为母节点且被分裂成另外四个子节点。作为四叉树的叶节点的最终不可分裂的子节点包括解码节点,例如,经解码视频块。与经解码码流相关联的语法数据可定义树块可分裂的最大次数,且也可定义解码节点的最小大小。Generally speaking, the working model description of HM can divide a video frame or image into a sequence of tree blocks or maximum coding units (LCUs) containing both luminance and chrominance samples. LCUs are also known as CTUs. The tree block has a similar purpose as the macro block of the H.264 standard. A slice contains several consecutive tree blocks in decoding order. A video frame or image can be split into one or more slices. Each tree block can be split into coding units according to a quadtree. For example, a tree block that is a root node of a quad tree may be split into four child nodes, and each child node may be a parent node and split into another four child nodes. The final indivisible child nodes that are leaf nodes of the quadtree include decoding nodes, such as decoded video blocks. The syntax data associated with the decoded codestream can define the maximum number of times a tree block can be split, and can also define the minimum size of a decoding node.
编码单元包含解码节点及预测块(prediction unit,PU)以及与解码节点相关联的变换单元(transform unit,TU)。CU的大小对应于解码节点的大小且形状必须为正方形。CU的大小的范围可为8×8像素直到最大64×64像素或更大的树块的大小。每一CU可含有一个或多个PU及一个或多个TU。例如,与CU相关联的语法数据可描述将CU分割成一个或多个PU的情形。分割模式在CU是被跳过或经直接模式编码、帧内预测模式编码或帧间预测模式编码的情形之间可为不同的。PU可经分割成形状为非正方形。例如,与CU相关联的语法数据也可描述根据四叉树将CU分割成一个或多个TU的情形。TU的形状可为正方形或非正方形。The coding unit includes a decoding node, a prediction unit (PU), and a transformation unit (TU) associated with the decoding node. The size of the CU corresponds to the size of the decoding node and the shape must be square. The size of the CU can range from 8 × 8 pixels to a maximum 64 × 64 pixels or larger tree block size. Each CU may contain one or more PUs and one or more TUs. For example, the syntax data associated with a CU may describe a case where a CU is partitioned into one or more PUs. The partitioning mode may be different between cases where the CU is skipped or is encoded in direct mode, intra prediction mode, or inter prediction mode. The PU can be divided into non-square shapes. For example, the syntax data associated with a CU may also describe a case where a CU is partitioned into one or more TUs according to a quadtree. The shape of the TU can be square or non-square.
HEVC标准允许根据TU进行变换,TU对于不同CU来说可为不同的。TU通常基于针对经分割LCU定义的给定CU内的PU的大小而设定大小,但情况可能并非总是如此。TU的大小通常与PU相同或小于PU。在一些可行的实施方式中,可使用称作“残余四叉树”(residual qualtree,RQT)的四叉树结构将对应于CU的残余样本再分成较小单元。RQT的叶节点可被称作TU。可变换与TU相关联的像素差值以产生变换系数,变换系数可被量化。The HEVC standard allows transformation based on the TU, which can be different for different CUs. The TU is usually sized based on the size of the PUs within a given CU defined for the partitioned LCU, but this may not always be the case. The size of the TU is usually the same as or smaller than the PU. In some feasible implementations, a quad-tree structure called "residual quad-tree" (RQT) can be used to subdivide the residual samples corresponding to the CU into smaller units. The leaf node of RQT may be called TU. The pixel difference values associated with the TU may be transformed to produce a transformation coefficient, which may be quantized.
一般来说,PU包含与预测过程有关的数据。例如,在PU经帧内模式编码时,PU可包含描述PU的帧内预测模式的数据。作为另一可行的实施方式,在PU经帧间模式编码时,PU可包含界定PU的运动矢量的数据。例如,界定PU的运动矢量的数据可描述运动矢量的水平分量、运动矢量的垂直分量、运动矢量的分辨率(例如,四分之一像素精确度或八分之一像素精确度)、运动矢量所指向的参考图像,和/或运动矢量的参考图像列表(例如,列表0、列表1或列表C)。Generally speaking, the PU contains data related to the prediction process. For example, when a PU is intra-mode encoded, the PU may include data describing the intra-prediction mode of the PU. As another feasible implementation manner, when the PU is inter-mode encoded, the PU may include data defining a motion vector of the PU. For example, the data defining the motion vector of the PU may describe the horizontal component of the motion vector, the vertical component of the motion vector, the resolution of the motion vector (e.g., quarter-pixel accuracy or eighth-pixel accuracy), motion vector The reference image pointed to, and / or the reference image list of the motion vector (eg, list 0, list 1 or list C).
一般来说,TU使用变换及量化过程。具有一个或多个PU的给定CU也可包含一个或多个TU。在预测之后,视频编码器100可计算对应于PU的残余值。残余值包括像素差值,像素 差值可变换成变换系数、经量化且使用TU扫描以产生串行化变换系数以用于熵解码。本申请通常使用术语“视频块”来指CU的解码节点。在一些特定应用中,本申请也可使用术语“视频块”来指包含解码节点以及PU及TU的树块,例如,LCU或CU。Generally, TU uses transform and quantization processes. A given CU with one or more PUs may also contain one or more TUs. After prediction, video encoder 100 may calculate a residual value corresponding to the PU. The residual values include pixel differences that can be transformed into transform coefficients, quantized, and scanned using TU to generate serialized transform coefficients for entropy decoding. This application generally uses the term "video block" to refer to the decoding node of a CU. In some specific applications, the term “video block” may also be used in this application to refer to a tree block including a decoding node and a PU and a TU, such as an LCU or a CU.
视频序列通常包含一系列视频帧或图像。图像群组(group of picture,GOP)示例性地包括一系列、一个或多个视频图像。GOP可在GOP的头信息中、图像中的一者或多者的头信息中或在别处包含语法数据,语法数据描述包含于GOP中的图像的数目。图像的每一条带可包含描述相应图像的编码模式的条带语法数据。视频编码器100通常对个别视频条带内的视频块进行操作以便编码视频数据。视频块可对应于CU内的解码节点。视频块可具有固定或变化的大小,且可根据指定解码标准而在大小上不同。A video sequence usually contains a series of video frames or images. A group of pictures (GOP) exemplarily includes a series, one or more video pictures. The GOP may include syntax data in the header information of the GOP, the header information of one or more of the pictures, or elsewhere, and the syntax data describes the number of pictures included in the GOP. Each slice of the image may contain slice syntax data describing the coding mode of the corresponding image. Video encoder 100 typically operates on video blocks within individual video slices to encode video data. A video block may correspond to a decoding node within a CU. Video blocks may have fixed or varying sizes, and may differ in size according to a specified decoding standard.
作为一种可行的实施方式,HM支持各种PU大小的预测。假定特定CU的大小为2N×2N,HM支持2N×2N或N×N的PU大小的帧内预测,及2N×2N、2N×N、N×2N或N×N的对称PU大小的帧间预测。HM也支持2N×nU、2N×nD、nL×2N及nR×2N的PU大小的帧间预测的不对称分割。在不对称分割中,CU的一方向未分割,而另一方向分割成25%及75%。对应于25%区段的CU的部分由“n”后跟着“上(Up)”、“下(Down)”、“左(Left)”或“右(Right)”的指示来指示。因此,例如,“2N×nU”指水平分割的2N×2NCU,其中2N×0.5NPU在上部且2N×1.5NPU在底部。As a feasible implementation, HM supports prediction of various PU sizes. Assuming the size of a specific CU is 2N × 2N, HM supports intra prediction of PU sizes of 2N × 2N or N × N, and symmetric PU sizes of 2N × 2N, 2N × N, N × 2N or N × N prediction. HM also supports asymmetric partitioning of PU-sized inter predictions of 2N × nU, 2N × nD, nL × 2N, and nR × 2N. In asymmetric partitioning, one direction of the CU is not partitioned, and the other direction is partitioned into 25% and 75%. The portion of the CU corresponding to the 25% section is indicated by an indication of "n" followed by "Up", "Down", "Left", or "Right". Therefore, for example, “2N × nU” refers to a horizontally-divided 2N × 2NCU, where 2N × 0.5NPU is at the top and 2N × 1.5NPU is at the bottom.
在本申请中,“N×N”与“N乘N”可互换使用以指依照垂直维度及水平维度的视频块的像素尺寸,例如,16×16像素或16乘16像素。一般来说,16×16块将在垂直方向上具有16个像素(y=16),且在水平方向上具有16个像素(x=16)。同样地,N×N块一股在垂直方向上具有N个像素,且在水平方向上具有N个像素,其中N表示非负整数值。可将块中的像素排列成行及列。此外,块未必需要在水平方向上与在垂直方向上具有相同数目个像素。例如,块可包括N×M个像素,其中M未必等于N。In this application, “N × N” and “N times N” are used interchangeably to refer to the pixel size of a video block according to vertical and horizontal dimensions, for example, 16 × 16 pixels or 16 × 16 pixels. In general, a 16 × 16 block will have 16 pixels (y = 16) in the vertical direction and 16 pixels (x = 16) in the horizontal direction. Similarly, an N × N block has N pixels in the vertical direction and N pixels in the horizontal direction, where N represents a non-negative integer value. Pixels in a block can be arranged in rows and columns. In addition, the block does not necessarily need to have the same number of pixels in the horizontal direction as in the vertical direction. For example, a block may include N × M pixels, where M is not necessarily equal to N.
在使用CU的PU的帧内预测性或帧间预测性解码之后,视频编码器100可计算CU的TU的残余数据。PU可包括空间域(也称作像素域)中的像素数据,且TU可包括在将变换(例如,离散余弦变换(discrete cosine transform,DCT)、整数变换、小波变换或概念上类似的变换)应用于残余视频数据之后变换域中的系数。残余数据可对应于未经编码图像的像素与对应于PU的预测值之间的像素差。视频编码器100可形成包含CU的残余数据的TU,且接着变换TU以产生CU的变换系数。After the intra-predictive or inter-predictive decoding of the PU using the CU, the video encoder 100 may calculate the residual data of the TU of the CU. A PU may include pixel data in a spatial domain (also referred to as a pixel domain), and a TU may include transforming (e.g., discrete cosine transform (DCT), integer transform, wavelet transform, or conceptually similar transform) Coefficients in the transform domain after being applied to the residual video data. The residual data may correspond to a pixel difference between a pixel of an uncoded image and a prediction value corresponding to a PU. The video encoder 100 may form a TU including residual data of a CU, and then transform the TU to generate a transform coefficient of the CU.
在任何变换以产生变换系数之后,视频编码器100可执行变换系数的量化。量化示例性地指对系数进行量化以可能减少用以表示系数的数据的量从而提供进一步压缩的过程。量化过程可减少与系数中的一些或全部相关联的位深度。例如,可在量化期间将n位值降值舍位到m位值,其中n大于m。After any transform to generate transform coefficients, video encoder 100 may perform quantization of the transform coefficients. Quantization exemplarily refers to the process of quantizing coefficients to possibly reduce the amount of data used to represent the coefficients to provide further compression. The quantization process may reduce the bit depth associated with some or all of the coefficients. For example, n-bit values may be rounded down to m-bit values during quantization, where n is greater than m.
JEM模型对视频图像的编码结构进行了进一步的改进,具体的,被称为“四叉树结合二叉树”(QTBT)的块编码结构被引入进来。QTBT结构摒弃了HEVC中的CU,PU,TU等概念,支持更灵活的CU划分形状,一个CU可以正方形,也可以是长方形。一个CTU首先进行四叉树划分,该四叉树的叶节点进一步进行二叉树划分。同时,在二叉树划分中存在两种划分模式,对称水平分割和对称竖直分割。二叉树的叶节点被称为CU,JEM的CU在预测和变换的过程中都不可以被进一步划分,也就是说JEM的CU,PU,TU具有相同的块大小。在现阶段的JEM中,CTU的最大尺寸为256×256亮度像素。The JEM model further improves the coding structure of video images. Specifically, a block coding structure called "Quad Tree Combined with Binary Tree" (QTBT) is introduced. The QTBT structure abandons the concepts of CU, PU, and TU in HEVC, and supports more flexible CU division shapes. A CU can be square or rectangular. A CTU first performs a quadtree partition, and the leaf nodes of the quadtree further perform a binary tree partition. At the same time, there are two partitioning modes in binary tree partitioning, symmetrical horizontal partitioning and symmetrical vertical partitioning. The leaf nodes of a binary tree are called CUs. JEM's CUs cannot be further divided during the prediction and transformation process, which means that JEM's CU, PU, and TU have the same block size. In the current JEM, the maximum size of the CTU is 256 × 256 luminance pixels.
在一些可行的实施方式中,视频编码器100可利用预定义扫描次序来扫描经量化变换系数以产生可经熵编码的串行化向量。在其它可行的实施方式中,视频编码器100可执行自适应性扫描。在扫描经量化变换系数以形成一维向量之后,视频编码器100可根据上下文自适应性可变长度解码(CAVLC)、上下文自适应性二进制算术解码(CABAC)、基于语法的上下文自适应性二进制算术解码(SBAC)、概率区间分割熵(PIPE)解码或其他熵解码方法来熵解码一维向量。视频编码器100也可熵编码与经编码视频数据相关联的语法元素以供视频解码器200用于解码视频数据。In some feasible implementations, the video encoder 100 may utilize a predefined scan order to scan the quantized transform coefficients to generate a serialized vector that can be entropy encoded. In other possible implementations, the video encoder 100 may perform adaptive scanning. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 100 may perform context-adaptive variable length decoding (CAVLC), context-adaptive binary arithmetic decoding (CABAC), syntax-based context-adaptive binary Arithmetic decoding (SBAC), probability interval partition entropy (PIPE) decoding, or other entropy decoding methods to entropy decode a one-dimensional vector. Video encoder 100 may also entropy encode syntax elements associated with the encoded video data for use by video decoder 200 to decode the video data.
为了执行CABAC,视频编码器100可将上下文模型内的上下文指派给待传输的符号。上下文可与符号的相邻值是否为非零有关。为了执行CAVLC,视频编码器100可选择待传输的符号的可变长度码。可变长度解码(VLC)中的码字可经构建以使得相对较短码对应于可能性较大的符号,而较长码对应于可能性较小的符号。以这个方式,VLC的使用可相对于针对待传输的每一符号使用相等长度码字达成节省码率的目的。基于指派给符号的上下文可以确定CABAC中的概率。To perform CABAC, video encoder 100 may assign a context within a context model to a symbol to be transmitted. Context can be related to whether adjacent values of a symbol are non-zero. To perform CAVLC, the video encoder 100 may select a variable length code of a symbol to be transmitted. Codewords in Variable Length Decoding (VLC) may be constructed such that relatively short codes correspond to more likely symbols and longer codes correspond to less likely symbols. In this way, the use of VLC can achieve the goal of saving code rates relative to using equal length codewords for each symbol to be transmitted. The probability in CABAC can be determined based on the context assigned to the symbol.
在本申请实施例中,视频编码器可执行帧间预测以减少图像之间的时间冗余。如前文所描述,根据不同视频压缩编解码标准的规定,CU可具有一个或多个预测单元PU。换句话说,多个PU可属于CU,或者PU和CU的尺寸相同。在本文中当CU和PU尺寸相同时,CU的分割模式为不分割,或者即为分割为一个PU,且统一使用PU进行表述。当视频编码器执行帧间预测时,视频编码器可用信号通知视频解码器用于PU的运动信息。示例性的,PU的运动信息可以包括:参考图像索引、运动矢量和预测方向标识。运动矢量可指示PU的图像块(也称视频块、像素块、像素集合等)与PU的参考块之间的位移。PU的参考块可为类似于PU的图像块的参考图像的一部分。参考块可定位于由参考图像索引和预测方向标识指示的参考图像中。In the embodiment of the present application, the video encoder may perform inter prediction to reduce temporal redundancy between images. As described above, a CU may have one or more prediction units PU according to the provisions of different video compression codec standards. In other words, multiple PUs may belong to a CU, or PUs and CUs are the same size. In this article, when the size of the CU and the PU are the same, the CU's partitioning mode is not divided, or it is divided into one PU, and the PU is uniformly used for expression. When the video encoder performs inter prediction, the video encoder may signal the video decoder motion information for the PU. Exemplarily, the motion information of the PU may include: a reference image index, a motion vector, and a prediction direction identifier. A motion vector may indicate a displacement between an image block (also called a video block, a pixel block, a pixel set, etc.) of a PU and a reference block of the PU. The reference block of the PU may be a part of the reference picture similar to the image block of the PU. The reference block may be located in a reference image indicated by a reference image index and a prediction direction identifier.
为了减少表示PU的运动信息所需要的编码比特的数目,视频编码器可根据合并预测模式或高级运动矢量预测模式过程产生用于PU中的每一者的候选预测运动矢量(Motion Vector,MV)列表。用于PU的候选预测运动矢量列表中的每一候选预测运动矢量可指示运动信息。由候选预测运动矢量列表中的一些候选预测运动矢量指示的运动信息可基于其它PU的运动信息。如果候选预测运动矢量指示指定空间候选预测运动矢量位置或时间候选预测运动矢量位置中的一者的运动信息,则本申请可将所述候选预测运动矢量称作“原始”候选预测运动矢量。举例来说,对于合并模式,在本文中也称为合并预测模式,可存在五个原始空间候选预测运动矢量位置和一个原始时间候选预测运动矢量位置。在一些实例中,视频编码器可通过组合来自不同原始候选预测运动矢量的部分运动矢量、修改原始候选预测运动矢量或仅插入零运动矢量作为候选预测运动矢量来产生额外候选预测运动矢量。这些额外候选预测运动矢量不被视为原始候选预测运动矢量且在本申请中可称作人工产生的候选预测运动矢量。To reduce the number of coding bits required to represent the motion information of the PU, the video encoder may generate candidate prediction motion vectors (Motion Vector, MV) for each of the PUs according to the merge prediction mode or advanced motion vector prediction mode process. List. Each candidate prediction motion vector in the candidate prediction motion vector list for the PU may indicate motion information. The motion information indicated by some candidate prediction motion vectors in the candidate prediction motion vector list may be based on the motion information of other PUs. If the candidate prediction motion vector indicates motion information specifying one of a spatial candidate prediction motion vector position or a temporal candidate prediction motion vector position, the present application may refer to the candidate prediction motion vector as an "original" candidate prediction motion vector. For example, for a merge mode, also referred to herein as a merge prediction mode, there may be five original spatial candidate prediction motion vector positions and one original temporal candidate prediction motion vector position. In some examples, the video encoder may generate additional candidate prediction motion vectors by combining partial motion vectors from different original candidate prediction motion vectors, modifying the original candidate prediction motion vectors, or inserting only zero motion vectors as candidate prediction motion vectors. These additional candidate prediction motion vectors are not considered as original candidate prediction motion vectors and may be referred to as artificially generated candidate prediction motion vectors in this application.
本申请的技术一般涉及用于在视频编码器处产生候选预测运动矢量列表的技术和用于在视频解码器处产生相同候选预测运动矢量列表的技术。视频编码器和视频解码器可通过实施用于构建候选预测运动矢量列表的相同技术来产生相同候选预测运动矢量列表。举例来说,视频编码器和视频解码器两者可构建具有相同数目的候选预测运动矢量(例如,五个候选预测运动矢量)的列表。视频编码器和解码器可首先考虑空间候选预测运动矢量(例如,同一图像中的相邻块),接着考虑时间候选预测运动矢量(例如,不同图像中的候 选预测运动矢量),且最后可考虑人工产生的候选预测运动矢量直到将所要数目的候选预测运动矢量添加到列表为止。根据本申请的技术,可在候选预测运动矢量列表构建期间针对某些类型的候选预测运动矢量利用修剪操作以便从候选预测运动矢量列表移除重复,而对于其它类型的候选预测运动矢量,可能不使用修剪以便减小解码器复杂性。举例来说,对于空间候选预测运动矢量集合和对于时间候选预测运动矢量,可执行修剪操作以从候选预测运动矢量的列表排除具有重复运动信息的候选预测运动矢量。然而,当将人工产生的候选预测运动矢量添加到候选预测运动矢量的列表时,可在不对人工产生的候选预测运动矢量执行修剪操作的情况下添加人工产生的候选预测运动矢量。The techniques of this application generally relate to a technique for generating a list of candidate prediction motion vectors at a video encoder and a technique for generating the same list of candidate prediction motion vectors at a video decoder. The video encoder and video decoder may generate the same candidate prediction motion vector list by implementing the same techniques used to construct the candidate prediction motion vector list. For example, both a video encoder and a video decoder may build a list with the same number of candidate prediction motion vectors (eg, five candidate prediction motion vectors). Video encoders and decoders may first consider spatial candidate prediction motion vectors (e.g., neighboring blocks in the same image), then consider temporal candidate prediction motion vectors (e.g., candidate prediction motion vectors in different images), and finally consider The artificially generated candidate prediction motion vectors are added until a desired number of candidate prediction motion vectors are added to the list. According to the technology of the present application, a pruning operation may be used for certain types of candidate prediction motion vectors during the construction of the candidate prediction motion vector list to remove duplicates from the candidate prediction motion vector list, while for other types of candidate prediction motion vectors, it may Use pruning to reduce decoder complexity. For example, for a set of spatial candidate prediction motion vectors and for a time candidate prediction motion vector, a pruning operation may be performed to exclude candidate prediction motion vectors with duplicate motion information from the list of candidate prediction motion vectors. However, when artificially generated candidate predicted motion vectors are added to the list of candidate predicted motion vectors, artificially generated candidate predicted motion vectors may be added without performing a trimming operation on the artificially generated candidate predicted motion vectors.
在产生用于CU的PU的候选预测运动矢量列表之后,视频编码器可从候选预测运动矢量列表选择候选预测运动矢量且在码流中输出候选预测运动矢量索引。选定候选预测运动矢量可为具有产生最紧密地匹配正被解码的目标PU的预测子的运动矢量的候选预测运动矢量。候选预测运动矢量索引可指示在候选预测运动矢量列表中选定候选预测运动矢量的位置。视频编码器还可基于由PU的运动信息指示的参考块产生用于PU的预测性图像块。可基于由选定候选预测运动矢量指示的运动信息确定PU的运动信息。举例来说,在合并模式中,PU的运动信息可与由选定候选预测运动矢量指示的运动信息相同。在AMVP模式中,PU的运动信息可基于PU的运动矢量差和由选定候选预测运动矢量指示的运动信息确定。视频编码器可基于CU的PU的预测性图像块和用于CU的原始图像块产生用于CU的一或多个残余图像块。视频编码器可接着编码一或多个残余图像块且在码流中输出一或多个残余图像块。After generating the candidate prediction motion vector list for the PU of the CU, the video encoder may select the candidate prediction motion vector from the candidate prediction motion vector list and output the candidate prediction motion vector index in the code stream. The selected candidate prediction motion vector may be a candidate prediction motion vector having a motion vector that most closely matches the predictor of the target PU being decoded. The candidate prediction motion vector index may indicate a position where a candidate prediction motion vector is selected in the candidate prediction motion vector list. The video encoder may also generate a predictive image block for the PU based on a reference block indicated by the motion information of the PU. The motion information of the PU may be determined based on the motion information indicated by the selected candidate prediction motion vector. For example, in the merge mode, the motion information of the PU may be the same as the motion information indicated by the selected candidate prediction motion vector. In the AMVP mode, the motion information of the PU may be determined based on the motion vector difference of the PU and the motion information indicated by the selected candidate prediction motion vector. The video encoder may generate one or more residual image blocks for the CU based on the predictive image blocks of the PU of the CU and the original image blocks for the CU. The video encoder may then encode one or more residual image blocks and output one or more residual image blocks in a code stream.
码流可包括识别PU的候选预测运动矢量列表中的选定候选预测运动矢量的数据。视频解码器可基于由PU的候选预测运动矢量列表中的选定候选预测运动矢量指示的运动信息确定PU的运动信息。视频解码器可基于PU的运动信息识别用于PU的一或多个参考块。在识别PU的一或多个参考块之后,视频解码器可基于PU的一或多个参考块产生用于PU的预测性图像块。视频解码器可基于用于CU的PU的预测性图像块和用于CU的一或多个残余图像块来重构用于CU的图像块。The codestream may include data identifying a selected candidate prediction motion vector in the candidate prediction motion vector list of the PU. The video decoder may determine the motion information of the PU based on the motion information indicated by the selected candidate prediction motion vector in the candidate prediction motion vector list of the PU. The video decoder may identify one or more reference blocks for the PU based on the motion information of the PU. After identifying one or more reference blocks of the PU, the video decoder may generate predictive image blocks for the PU based on the one or more reference blocks of the PU. The video decoder may reconstruct an image block for a CU based on a predictive image block for a PU of the CU and one or more residual image blocks for the CU.
为了易于解释,本申请可将位置或图像块描述为与CU或PU具有各种空间关系。此描述可解释为是指位置或图像块和与CU或PU相关联的图像块具有各种空间关系。此外,本申请可将视频解码器当前在解码的PU称作当前PU,也称为当前待处理图像块。本申请可将视频解码器当前在解码的CU称作当前CU。本申请可将视频解码器当前在解码的图像称作当前图像。应理解,本申请同时适用于PU和CU具有相同尺寸,或者PU即为CU的情况,统一使用PU来表示。For ease of explanation, the present application may describe a position or an image block as having various spatial relationships with a CU or a PU. This description can be interpreted to mean that the position or image block and the image block associated with the CU or PU have various spatial relationships. In addition, in this application, a PU currently being decoded by a video decoder may be referred to as a current PU, and may also be referred to as a current image block to be processed. This application may refer to the CU that the video decoder is currently decoding as the current CU. This application may refer to the image currently being decoded by the video decoder as the current image. It should be understood that this application is applicable to a case where the PU and the CU have the same size, or the PU is the CU, and the PU is used to represent the same.
如前文简短地描述,视频编码器100可使用帧间预测以产生用于CU的PU的预测性图像块和运动信息。在许多例子中,给定PU的运动信息可能与一或多个附近PU(即,其图像块在空间上或时间上在给定PU的图像块附近的PU)的运动信息相同或类似。因为附近PU经常具有类似运动信息,所以视频编码器100可参考附近PU的运动信息来编码给定PU的运动信息。参考附近PU的运动信息来编码给定PU的运动信息可减少码流中指示给定PU的运动信息所需要的编码比特的数目。As briefly described previously, video encoder 100 may use inter prediction to generate predictive image blocks and motion information for a PU of a CU. In many examples, the motion information of a given PU may be the same or similar to the motion information of one or more nearby PUs (ie, PUs whose image blocks are spatially or temporally near the image blocks of the given PU). Because nearby PUs often have similar motion information, video encoder 100 may refer to the motion information of nearby PUs to encode motion information for a given PU. Encoding the motion information of a given PU with reference to the motion information of nearby PUs can reduce the number of encoding bits required to indicate the motion information of a given PU in the code stream.
视频编码器100可以各种方式参考附近PU的运动信息来编码给定PU的运动信息。举例来说,视频编码器100可指示给定PU的运动信息与附近PU的运动信息相同。本申请可使用合并模式来指代指示给定PU的运动信息与附近PU的运动信息相同或可从附近PU的 运动信息导出。在另一可行的实施方式中,视频编码器100可计算用于给定PU的运动矢量差(Motion Vector Difference,MVD)。MVD指示给定PU的运动矢量与附近PU的运动矢量之间的差。视频编码器100可将MVD而非给定PU的运动矢量包括于给定PU的运动信息中。在码流中表示MVD比表示给定PU的运动矢量所需要的编码比特少。本申请可使用高级运动矢量预测模式指代通过使用MVD和识别候选者运动矢量的索引值来用信号通知解码端给定PU的运动信息。 Video encoder 100 may refer to motion information of nearby PUs in various ways to encode motion information for a given PU. For example, video encoder 100 may indicate that the motion information of a given PU is the same as the motion information of nearby PUs. This application may use a merge mode to refer to indicating that the motion information of a given PU is the same as that of nearby PUs or may be derived from the motion information of nearby PUs. In another feasible implementation, the video encoder 100 may calculate a Motion Vector Difference (MVD) for a given PU. MVD indicates the difference between the motion vector of a given PU and the motion vector of a nearby PU. Video encoder 100 may include MVD instead of a motion vector of a given PU in the motion information of a given PU. Representing MVD in the codestream requires fewer coding bits than representing the motion vector of a given PU. This application may use advanced motion vector prediction mode to refer to the motion information of a given PU by using the MVD and an index value identifying a candidate motion vector.
为了使用合并模式或AMVP模式来用信号通知解码端给定PU的运动信息,视频编码器100可产生用于给定PU的候选预测运动矢量列表。候选预测运动矢量列表可包括一或多个候选预测运动矢量。用于给定PU的候选预测运动矢量列表中的候选预测运动矢量中的每一者可指定运动信息。由每一候选预测运动矢量指示的运动信息可包括运动矢量、参考图像索引和预测方向标识。候选预测运动矢量列表中的候选预测运动矢量可包括“原始”候选预测运动矢量,其中每一者指示不同于给定PU的PU内的指定候选预测运动矢量位置中的一者的运动信息。In order to use the merge mode or the AMVP mode to signal the motion information of a given PU on the decoding side, the video encoder 100 may generate a list of candidate predicted motion vectors for a given PU. The candidate prediction motion vector list may include one or more candidate prediction motion vectors. Each of the candidate prediction motion vectors in the candidate prediction motion vector list for a given PU may specify motion information. The motion information indicated by each candidate prediction motion vector may include a motion vector, a reference image index, and a prediction direction identifier. The candidate prediction motion vectors in the candidate prediction motion vector list may include "raw" candidate prediction motion vectors, each of which indicates motion information that is different from one of the specified candidate prediction motion vector positions within a PU of a given PU.
在产生用于PU的候选预测运动矢量列表之后,视频编码器100可从用于PU的候选预测运动矢量列表选择候选预测运动矢量中的一者。举例来说,视频编码器可比较每一候选预测运动矢量与正被解码的PU且可选择具有所要码率-失真代价的候选预测运动矢量。视频编码器100可输出用于PU的候选预测运动矢量索引。候选预测运动矢量索引可识别选定候选预测运动矢量在候选预测运动矢量列表中的位置。After generating the candidate prediction motion vector list for the PU, the video encoder 100 may select one of the candidate prediction motion vectors from the candidate prediction motion vector list for the PU. For example, a video encoder may compare each candidate prediction motion vector with the PU being decoded and may select a candidate prediction motion vector with a desired code rate-distortion cost. Video encoder 100 may output a candidate prediction motion vector index for a PU. The candidate prediction motion vector index may identify the position of the selected candidate prediction motion vector in the candidate prediction motion vector list.
此外,视频编码器100可基于由PU的运动信息指示的参考块产生用于PU的预测性图像块。可基于由用于PU的候选预测运动矢量列表中的选定候选预测运动矢量指示的运动信息确定PU的运动信息。举例来说,在合并模式中,PU的运动信息可与由选定候选预测运动矢量指示的运动信息相同。在AMVP模式中,可基于用于PU的运动矢量差和由选定候选预测运动矢量指示的运动信息确定PU的运动信息。视频编码器100可如前文所描述处理用于PU的预测性图像块。In addition, the video encoder 100 may generate a predictive image block for a PU based on a reference block indicated by motion information of the PU. The motion information of the PU may be determined based on the motion information indicated by the selected candidate prediction motion vector in the candidate prediction motion vector list for the PU. For example, in the merge mode, the motion information of the PU may be the same as the motion information indicated by the selected candidate prediction motion vector. In the AMVP mode, motion information of a PU may be determined based on a motion vector difference for the PU and motion information indicated by a selected candidate prediction motion vector. Video encoder 100 may process predictive image blocks for a PU as described previously.
当视频解码器200接收到码流时,视频解码器200可产生用于CU的PU中的每一者的候选预测运动矢量列表。由视频解码器200针对PU产生的候选预测运动矢量列表可与由视频编码器100针对PU产生的候选预测运动矢量列表相同。从码流中解析得到的语法元素可指示在PU的候选预测运动矢量列表中选定候选预测运动矢量的位置。在产生用于PU的候选预测运动矢量列表之后,视频解码器200可基于由PU的运动信息指示的一或多个参考块产生用于PU的预测性图像块。视频解码器200可基于由用于PU的候选预测运动矢量列表中的选定候选预测运动矢量指示的运动信息确定PU的运动信息。视频解码器200可基于用于PU的预测性图像块和用于CU的残余图像块重构用于CU的图像块。When video decoder 200 receives a code stream, video decoder 200 may generate a list of candidate predicted motion vectors for each of the PUs of the CU. The candidate prediction motion vector list generated by the video decoder 200 for the PU may be the same as the candidate prediction motion vector list generated by the video encoder 100 for the PU. The syntax element parsed from the bitstream may indicate the position of the candidate prediction motion vector selected in the candidate prediction motion vector list of the PU. After generating a list of candidate prediction motion vectors for the PU, the video decoder 200 may generate predictive image blocks for the PU based on one or more reference blocks indicated by the motion information of the PU. Video decoder 200 may determine motion information of the PU based on the motion information indicated by the selected candidate prediction motion vector in the candidate prediction motion vector list for the PU. Video decoder 200 may reconstruct an image block for a CU based on a predictive image block for a PU and a residual image block for a CU.
应理解,在一种可行的实施方式中,在解码端,候选预测运动矢量列表的构建与从码流中解析选定候选预测运动矢量在候选预测运动矢量列表中的位置是相互独立,可以任意先后或者并行进行的。It should be understood that, in a feasible implementation manner, at the decoding end, the construction of the candidate prediction motion vector list and the parsing of the selected candidate prediction motion vector from the code stream in the candidate prediction motion vector list are independent of each other, and can be arbitrarily Sequentially or in parallel.
在另一种可行的实施方式中,在解码端,首先从码流中解析选定候选预测运动矢量在候选预测运动矢量列表中的位置,根据解析出来的位置构建候选预测运动矢量列表,在该实施方式中,不需要构建全部的候选预测运动矢量列表,只需要构建到该解析出来的位置处的候选预测运动矢量列表,即能够确定该位置出的候选预测运动矢量即可。举例来说,当解析码流得出选定的候选预测运动矢量为候选预测运动矢量列表中索引为3的候选预测 运动矢量时,仅需要构建从索引为0到索引为3的候选预测运动矢量列表,即可确定索引为3的候选预测运动矢量,可以达到减小复杂度,提高解码效率的技术效果。In another feasible implementation manner, at the decoding end, the position of the selected candidate prediction motion vector in the candidate prediction motion vector list is first parsed from the code stream, and a candidate prediction motion vector list is constructed based on the parsed position. In the embodiment, it is not necessary to construct all the candidate prediction motion vector lists, but only the candidate prediction motion vector list at the parsed position, that is, the candidate prediction motion vectors at the position can be determined. For example, when the selected candidate predictive motion vector is obtained by parsing the bitstream and is a candidate predictive motion vector with an index of 3 in the candidate predictive motion vector list, only the candidate predictive motion vector from index 0 to index 3 needs to be constructed The list can determine the candidate predicted motion vector with the index of 3, which can achieve the technical effect of reducing complexity and improving decoding efficiency.
图2为本申请实施例中所描述的一种实例的视频编码器100的框图。视频编码器100用于将视频输出到后处理实体41。后处理实体41表示可处理来自视频编码器100的经编码视频数据的视频实体的实例,例如媒体感知网络元件(MANE)或拼接/编辑装置。在一些情况下,后处理实体41可为网络实体的实例。在一些视频编码系统中,后处理实体41和视频编码器100可为单独装置的若干部分,而在其它情况下,相对于后处理实体41所描述的功能性可由包括视频编码器100的相同装置执行。在某一实例中,后处理实体41是图1的存储装置40的实例。FIG. 2 is a block diagram of a video encoder 100 according to an example described in the embodiment of the present application. The video encoder 100 is configured to output a video to the post-processing entity 41. The post-processing entity 41 represents an example of a video entity that can process the encoded video data from the video encoder 100, such as a media-aware network element (MANE) or a stitching / editing device. In some cases, the post-processing entity 41 may be an instance of a network entity. In some video encoding systems, the post-processing entity 41 and the video encoder 100 may be parts of separate devices, while in other cases, the functionality described with respect to the post-processing entity 41 may be performed by the same device including the video encoder 100 carried out. In a certain example, the post-processing entity 41 is an example of the storage device 40 of FIG. 1.
在图2的实例中,视频编码器100包括预测处理单元108、滤波器单元106、经解码图像缓冲器(DPB)107、求和器112、变换器101、量化器102和熵编码器103。预测处理单元108包括帧间预测器110和帧内预测器109。为了图像块重构,视频编码器100还包含反量化器104、反变换器105和求和器111。滤波器单元106既定表示一或多个环路滤波器,例如去块滤波器、自适应环路滤波器(ALF)和样本自适应偏移(SAO)滤波器。尽管在图2A中将滤波器单元106示出为环路内滤波器,但在其它实现方式下,可将滤波器单元106实施为环路后滤波器。在一种示例下,视频编码器100还可以包括视频数据存储器、分割单元(图中未示意)。In the example of FIG. 2, the video encoder 100 includes a prediction processing unit 108, a filter unit 106, a decoded image buffer (DPB) 107, a summer 112, a transformer 101, a quantizer 102, and an entropy encoder 103. The prediction processing unit 108 includes an inter predictor 110 and an intra predictor 109. For image block reconstruction, the video encoder 100 further includes an inverse quantizer 104, an inverse transformer 105, and a summer 111. The filter unit 106 is intended to represent one or more loop filters, such as a deblocking filter, an adaptive loop filter (ALF), and a sample adaptive offset (SAO) filter. Although the filter unit 106 is shown as an in-loop filter in FIG. 2A, in other implementations, the filter unit 106 may be implemented as a post-loop filter. In one example, the video encoder 100 may further include a video data memory and a segmentation unit (not shown in the figure).
视频数据存储器可存储待由视频编码器100的组件编码的视频数据。可从视频源120获得存储在视频数据存储器中的视频数据。DPB 107可为参考图像存储器,其存储用于由视频编码器100在帧内、帧间译码模式中对视频数据进行编码的参考视频数据。视频数据存储器和DPB 107可由多种存储器装置中的任一者形成,例如包含同步DRAM(SDRAM)的动态随机存取存储器(DRAM)、磁阻式RAM(MRAM)、电阻式RAM(RRAM),或其它类型的存储器装置。视频数据存储器和DPB 107可由同一存储器装置或单独存储器装置提供。在各种实例中,视频数据存储器可与视频编码器100的其它组件一起在芯片上,或相对于那些组件在芯片外。The video data memory may store video data to be encoded by the components of the video encoder 100. The video data stored in the video data storage may be obtained from the video source 120. The DPB 107 may be a reference image memory that stores reference video data used by the video encoder 100 to encode video data in an intra-frame or inter-frame decoding mode. Video data memory and DPB 107 can be formed by any of a variety of memory devices, such as dynamic random access memory (DRAM), synchronous resistive RAM (MRAM), resistive RAM (RRAM) including synchronous DRAM (SDRAM), Or other types of memory devices. Video data storage and DPB 107 can be provided by the same storage device or separate storage devices. In various examples, the video data memory may be on-chip with other components of video encoder 100 or off-chip relative to those components.
如图2所示,视频编码器100接收视频数据,并将所述视频数据存储在视频数据存储器中。分割单元将所述视频数据分割成若干图像块,而且这些图像块可以被进一步分割为更小的块,例如基于四叉树结构或者二叉树结构的图像块分割。此分割还可包含分割成条带(slice)、片(tile)或其它较大单元。视频编码器100通常说明编码待编码的视频条带内的图像块的组件。所述条带可分成多个图像块(并且可能分成被称作片的图像块集合)。预测处理单元108可选择用于当前图像块的多个可能的译码模式中的一者,例如多个帧内译码模式中的一者或多个帧间译码模式中的一者。预测处理单元108可将所得经帧内、帧间译码的块提供给求和器112以产生残差块,且提供给求和器111以重构用作参考图像的经编码块。As shown in FIG. 2, the video encoder 100 receives video data and stores the video data in a video data memory. The segmentation unit divides the video data into several image blocks, and these image blocks can be further divided into smaller blocks, such as image block segmentation based on a quad tree structure or a binary tree structure. This segmentation may also include segmentation into slices, tiles, or other larger units. Video encoder 100 typically illustrates components that encode image blocks within a video slice to be encoded. The slice can be divided into multiple image patches (and possibly into a collection of image patches called slices). The prediction processing unit 108 may select one of a plurality of possible coding modes for the current image block, such as one of a plurality of intra coding modes or one of a plurality of inter coding modes. The prediction processing unit 108 may provide the obtained intra, inter-coded block to the summer 112 to generate a residual block, and to the summer 111 to reconstruct an encoded block used as a reference image.
预测处理单元108内的帧内预测器109可相对于与待编码当前块在相同帧或条带中的一或多个相邻块执行当前图像块的帧内预测性编码,以去除空间冗余。预测处理单元108内的帧间预测器110可相对于一或多个参考图像中的一或多个预测块执行当前图像块的帧间预测性编码以去除时间冗余。The intra predictor 109 within the prediction processing unit 108 may perform intra predictive encoding of the current image block with respect to one or more neighboring blocks in the same frame or slice as the current block to be encoded to remove spatial redundancy. . The inter predictor 110 within the prediction processing unit 108 may perform inter predictive coding of the current image block with respect to one or more prediction blocks in the one or more reference images to remove temporal redundancy.
具体的,帧间预测器110可用于确定用于编码当前图像块的帧间预测模式。举例来说,帧间预测器110可使用码率-失真分析来计算候选帧间预测模式集合中的各种帧间预测模 式的码率-失真值,并从中选择具有最佳码率-失真特性的帧间预测模式。码率失真分析通常确定经编码块与经编码以产生所述经编码块的原始的未经编码块之间的失真(或误差)的量,以及用于产生经编码块的位码率(也就是说,位数目)。例如,帧间预测器110可确定候选帧间预测模式集合中编码所述当前图像块的码率失真代价最小的帧间预测模式为用于对当前图像块进行帧间预测的帧间预测模式。Specifically, the inter predictor 110 may be configured to determine an inter prediction mode for encoding a current image block. For example, the inter predictor 110 may use a rate-distortion analysis to calculate the rate-distortion values of various inter-prediction modes in the set of candidate inter-prediction modes, and select from them the best rate-distortion characteristics. Inter prediction mode. Code rate distortion analysis generally determines the amount of distortion (or error) between the coded block and the original uncoded block that was coded to produce the coded block, and the bit rate (also That is, the number of bits). For example, the inter predictor 110 may determine that the inter prediction mode with the lowest code rate distortion cost of encoding the current image block in the candidate inter prediction mode set is the inter prediction mode used for inter prediction of the current image block.
帧间预测器110用于基于确定的帧间预测模式,预测当前图像块中一个或多个子块的运动信息(例如运动矢量),并利用当前图像块中一个或多个子块的运动信息(例如运动矢量)获取或产生当前图像块的预测块。帧间预测器110可在参考图像列表中的一者中定位所述运动向量指向的预测块。帧间预测器110还可产生与图像块和视频条带相关联的语法元素以供视频解码器200在对视频条带的图像块解码时使用。又或者,一种示例下,帧间预测器110利用每个子块的运动信息执行运动补偿过程,以生成每个子块的预测块,从而得到当前图像块的预测块;应当理解的是,这里的帧间预测器110执行运动估计和运动补偿过程。The inter predictor 110 is configured to predict motion information (for example, a motion vector) of one or more subblocks in the current image block based on the determined inter prediction mode, and use the motion information (for example, Motion vector) to obtain or generate a prediction block of the current image block. The inter predictor 110 may locate a prediction block pointed to by the motion vector in one of the reference image lists. The inter predictor 110 may also generate syntax elements associated with image blocks and video slices for use by the video decoder 200 when decoding image blocks of the video slice. In another example, the inter predictor 110 uses the motion information of each sub-block to perform a motion compensation process to generate a prediction block of each sub-block, thereby obtaining a prediction block of the current image block; The inter predictor 110 performs motion estimation and motion compensation processes.
具体的,在为当前图像块选择帧间预测模式之后,帧间预测器110可将指示当前图像块的所选帧间预测模式的信息提供到熵编码器103,以便于熵编码器103编码指示所选帧间预测模式的信息。Specifically, after the inter prediction mode is selected for the current image block, the inter predictor 110 may provide information indicating the selected inter prediction mode of the current image block to the entropy encoder 103 so that the entropy encoder 103 encodes the instruction. Information on the selected inter prediction mode.
帧内预测器109可对当前图像块执行帧内预测。明确地说,帧内预测器109可确定用来编码当前块的帧内预测模式。举例来说,帧内预测器109可使用码率-失真分析来计算各种待测试的帧内预测模式的码率-失真值,并从待测试模式当中选择具有最佳码率-失真特性的帧内预测模式。在任何情况下,在为图像块选择帧内预测模式之后,帧内预测器109可将指示当前图像块的所选帧内预测模式的信息提供到熵编码器103,以便熵编码器103编码指示所选帧内预测模式的信息。The intra predictor 109 may perform intra prediction on the current image block. In particular, the intra predictor 109 may determine an intra prediction mode used to encode the current block. For example, the intra predictor 109 may use a rate-distortion analysis to calculate the rate-distortion values of various intra-prediction modes to be tested, and select the one with the best rate-distortion characteristics from the modes to be tested. Intra prediction mode. In any case, after the intra prediction mode is selected for the image block, the intra predictor 109 may provide information indicating the selected intra prediction mode of the current image block to the entropy encoder 103 so that the entropy encoder 103 encodes the indication Information on the selected intra prediction mode.
在预测处理单元108经由帧间预测、帧内预测产生当前图像块的预测块之后,视频编码器100通过从待编码的当前图像块减去所述预测块来形成残差图像块。求和器112表示执行此减法运算的一或多个组件。所述残差块中的残差视频数据可包含在一或多个TU中,并应用于变换器101。变换器101使用例如离散余弦变换(DCT)或概念上类似的变换等变换将残差视频数据变换成残差变换系数。变换器101可将残差视频数据从像素值域转换到变换域,例如频域。After the prediction processing unit 108 generates a prediction block of the current image block via inter prediction and intra prediction, the video encoder 100 forms a residual image block by subtracting the prediction block from the current image block to be encoded. The summer 112 represents one or more components that perform this subtraction operation. The residual video data in the residual block may be included in one or more TUs and applied to the transformer 101. The transformer 101 transforms the residual video data into residual transform coefficients using a transform such as a discrete cosine transform (DCT) or a conceptually similar transform. The transformer 101 may transform the residual video data from a pixel value domain to a transform domain, such as a frequency domain.
变换器101可将所得变换系数发送到量化器102。量化器102量化所述变换系数以进一步减小位码率。在一些实例中,量化器102可接着执行对包含经量化的变换系数的矩阵的扫描。或者,熵编码器103可执行扫描。The transformer 101 may send the obtained transform coefficients to a quantizer 102. A quantizer 102 quantizes the transform coefficients to further reduce the bit code rate. In some examples, the quantizer 102 may then perform a scan of a matrix containing the quantized transform coefficients. Alternatively, the entropy encoder 103 may perform scanning.
在量化之后,熵编码器103对经量化变换系数进行熵编码。举例来说,熵编码器103可执行上下文自适应可变长度编码(CAVLC)、上下文自适应二进制算术编码(CABAC)、基于语法的上下文自适应二进制算术编码(SBAC)、概率区间分割熵(PIPE)编码或另一熵编码方法或技术。在由熵编码器103熵编码之后,可将经编码码流发射到视频解码器200,或经存档以供稍后发射或由视频解码器200检索。熵编码器103还可对待编码的当前图像块的语法元素进行熵编码。After quantization, the entropy encoder 103 entropy encodes the quantized transform coefficients. For example, the entropy encoder 103 can perform context-adaptive variable-length coding (CAVLC), context-adaptive binary arithmetic coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), and probability interval segmentation entropy (PIPE ) Coding or another entropy coding method or technique. After entropy encoding by the entropy encoder 103, the encoded code stream may be transmitted to the video decoder 200, or archived for later transmission or retrieved by the video decoder 200. The entropy encoder 103 may also perform entropy coding on the syntax elements of the current image block to be coded.
反量化器104和反变化器105分别应用逆量化和逆变换以在像素域中重构所述残差块,例如以供稍后用作参考图像的参考块。求和器111将经重构的残差块添加到由帧间预测器110或帧内预测器109产生的预测块,以产生经重构图像块。滤波器单元106可以适用于经 重构图像块以减小失真,诸如方块效应(block artifacts)。然后,该经重构图像块作为参考块存储在经解码图像缓冲器107中,可由帧间预测器110用作参考块以对后续视频帧或图像中的块进行帧间预测。The inverse quantizer 104 and the inverse changer 105 respectively apply inverse quantization and inverse transform to reconstruct the residual block in the pixel domain, for example, for later use as a reference block of a reference image. The summer 111 adds the reconstructed residual block to a prediction block generated by the inter predictor 110 or the intra predictor 109 to generate a reconstructed image block. The filter unit 106 may be adapted to reconstruct image blocks to reduce distortion, such as block artifacts. This reconstructed image block is then stored as a reference block in the decoded image buffer 107 and can be used by the inter predictor 110 as a reference block to perform inter prediction on subsequent video frames or blocks in the image.
应当理解的是,视频编码器100的其它的结构变化可用于编码视频流。例如,对于某些图像块或者图像帧,视频编码器100可以直接地量化残差信号而不需要经变换器101处理,相应地也不需要经反变换器105处理;或者,对于某些图像块或者图像帧,视频编码器100没有产生残差数据,相应地不需要经变换器101、量化器102、反量化器104和反变换器105处理;或者,视频编码器100可以将经重构图像块作为参考块直接地进行存储而不需要经滤波器单元106处理;或者,视频编码器100中量化器102和反量化器104可以合并在一起。It should be understood that other structural changes of the video encoder 100 may be used to encode a video stream. For example, for certain image blocks or image frames, the video encoder 100 may directly quantize the residual signal without processing by the transformer 101 and correspondingly does not need to be processed by the inverse transformer 105; or, for some image blocks Or image frames, the video encoder 100 does not generate residual data, and accordingly does not need to be processed by the transformer 101, quantizer 102, inverse quantizer 104, and inverse transformer 105; or, the video encoder 100 may convert the reconstructed image The blocks are stored directly as reference blocks without being processed by the filter unit 106; alternatively, the quantizer 102 and the inverse quantizer 104 in the video encoder 100 may be merged together.
图3为本申请实施例中所描述的一种实例的视频解码器200的框图。在图3的实例中,视频解码器200包括熵解码器203、预测处理单元208、反量化器204、反变换器205、求和器211、滤波器单元206以及经解码图像缓冲器207。预测处理单元208可以包括帧间预测器210和帧内预测器209。在一些实例中,视频解码器200可执行大体上与相对于来自图2的视频编码器100描述的编码过程互逆的解码过程。FIG. 3 is a block diagram of an example video decoder 200 described in the embodiment of the present application. In the example of FIG. 3, the video decoder 200 includes an entropy decoder 203, a prediction processing unit 208, an inverse quantizer 204, an inverse transformer 205, a summer 211, a filter unit 206, and a decoded image buffer 207. The prediction processing unit 208 may include an inter predictor 210 and an intra predictor 209. In some examples, video decoder 200 may perform a decoding process that is substantially inverse to the encoding process described with respect to video encoder 100 from FIG. 2.
在解码过程中,视频解码器200从视频编码器100接收表示经编码视频条带的图像块和相关联的语法元素的经编码视频码流。视频解码器200可从网络实体42接收视频数据,可选的,还可以将所述视频数据存储在视频数据存储器(图中未示意)中。视频数据存储器可存储待由视频解码器200的组件解码的视频数据,例如经编码视频码流。存储在视频数据存储器中的视频数据,例如可从存储装置40、从相机等本地视频源、经由视频数据的有线或无线网络通信或者通过存取物理数据存储媒体而获得。视频数据存储器可作为用于存储来自经编码视频码流的经编码视频数据的经解码图像缓冲器(CPB)。因此,尽管在图3中没有示意出视频数据存储器,但视频数据存储器和DPB 207可以是同一个的存储器,也可以是单独设置的存储器。视频数据存储器和DPB 207可由多种存储器装置中的任一者形成,例如:包含同步DRAM(SDRAM)的动态随机存取存储器(DRAM)、磁阻式RAM(MRAM)、电阻式RAM(RRAM),或其它类型的存储器装置。在各种实例中,视频数据存储器可与视频解码器200的其它组件一起集成在芯片上,或相对于那些组件设置在芯片外。During the decoding process, the video decoder 200 receives from the video encoder 100 an encoded video codestream representing image blocks of the encoded video slice and associated syntax elements. The video decoder 200 may receive video data from the network entity 42, optionally, the video data may also be stored in a video data storage (not shown in the figure). The video data memory may store video data, such as an encoded video code stream, to be decoded by components of the video decoder 200. The video data stored in the video data storage can be obtained, for example, from the storage device 40, from a local video source such as a camera, via a wired or wireless network of video data, or by accessing a physical data storage medium. The video data memory can be used as a decoded image buffer (CPB) for storing encoded video data from the encoded video bitstream. Therefore, although the video data storage is not shown in FIG. 3, the video data storage and the DPB 207 may be the same storage, or may be separately provided storages. Video data memory and DPB 207 can be formed by any of a variety of memory devices, such as: dynamic random access memory (DRAM) including synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), and resistive RAM (RRAM) , Or other types of memory devices. In various examples, the video data memory may be integrated on a chip with other components of the video decoder 200 or provided off-chip relative to those components.
网络实体42可例如为服务器、MANE、视频编辑器/剪接器,或用于实施上文所描述的技术中的一或多者的其它此装置。网络实体42可包括或可不包括视频编码器,例如视频编码器100。在网络实体42将经编码视频码流发送到视频解码器200之前,网络实体42可实施本申请中描述的技术中的部分。在一些视频解码系统中,网络实体42和视频解码器200可为单独装置的部分,而在其它情况下,相对于网络实体42描述的功能性可由包括视频解码器200的相同装置执行。在一些情况下,网络实体42可为图1的存储装置40的实例。The network entity 42 may be, for example, a server, a MANE, a video editor / splicer, or other such device for implementing one or more of the techniques described above. The network entity 42 may or may not include a video encoder, such as video encoder 100. Before the network entity 42 sends the encoded video code stream to the video decoder 200, the network entity 42 may implement some of the techniques described in this application. In some video decoding systems, the network entity 42 and the video decoder 200 may be part of separate devices, while in other cases, the functionality described with respect to the network entity 42 may be performed by the same device including the video decoder 200. In some cases, the network entity 42 may be an example of the storage device 40 of FIG. 1.
视频解码器200的熵解码器203对码流进行熵解码以产生经量化的系数和一些语法元素。熵解码器203将语法元素转发到预测处理单元208。视频解码器200可接收在视频条带层级和/或图像块层级处的语法元素。The entropy decoder 203 of the video decoder 200 entropy decodes the code stream to generate quantized coefficients and some syntax elements. The entropy decoder 203 forwards the syntax elements to the prediction processing unit 208. Video decoder 200 may receive syntax elements at a video slice level and / or an image block level.
当视频条带被解码为经帧内解码(I)条带时,预测处理单元208的帧内预测器209可基于发信号通知的帧内预测模式和来自当前帧或图像的先前经解码块的数据而产生当前视频条带的图像块的预测块。当视频条带被解码为经帧间解码(即,B或P)条带时,预测处理单元208的帧间预测器210可基于从熵解码器203接收到的语法元素,确定用于对当前视频条带的当前图像块进行解码的帧间预测模式,基于确定的帧间预测模式,对所述当前 图像块进行解码(例如执行帧间预测)。具体的,帧间预测器210可确定是否对当前视频条带的当前图像块采用新的帧间预测模式进行预测,如果语法元素指示采用新的帧间预测模式来对当前图像块进行预测,基于新的帧间预测模式(例如通过语法元素指定的一种新的帧间预测模式或默认的一种新的帧间预测模式)预测当前视频条带的当前图像块或当前图像块的子块的运动信息,从而通过运动补偿过程使用预测出的当前图像块或当前图像块的子块的运动信息来获取或生成当前图像块或当前图像块的子块的预测块。这里的运动信息可以包括参考图像信息和运动矢量,其中参考图像信息可以包括但不限于单向/双向预测信息,参考图像列表号和参考图像列表对应的参考图像索引。对于帧间预测,可从参考图像列表中的一者内的参考图像中的一者产生预测块。视频解码器200可基于存储在DPB 207中的参考图像来建构参考图像列表,即列表0和列表1。当前图像的参考帧索引可包含于参考帧列表0和列表1中的一或多者中。在一些实例中,可以是视频编码器100发信号通知指示是否采用新的帧间预测模式来解码特定块的特定语法元素,或者,也可以是发信号通知指示是否采用新的帧间预测模式,以及指示具体采用哪一种新的帧间预测模式来解码特定块的特定语法元素。应当理解的是,这里的帧间预测器210执行运动补偿过程。When a video slice is decoded into an intra decoded (I) slice, the intra predictor 209 of the prediction processing unit 208 may be based on the signaled intra prediction mode and the previously decoded block from the current frame or image. Data to generate prediction blocks for image blocks of the current video slice. When a video slice is decoded into an inter-decoded (ie, B or P) slice, the inter predictor 210 of the prediction processing unit 208 may determine, based on the syntax elements received from the entropy decoder 203, the An inter prediction mode in which a current image block of a video slice is decoded, and based on the determined inter prediction mode, the current image block is decoded (for example, inter prediction is performed). Specifically, the inter predictor 210 may determine whether to use the new inter prediction mode to predict the current image block of the current video slice. If the syntax element indicates that the new inter prediction mode is used to predict the current image block, based on A new inter prediction mode (for example, a new inter prediction mode specified by a syntax element or a default new inter prediction mode) predicts the current image block of the current video slice or a sub-block of the current image block. Motion information, so that the motion information of the current image block or a sub-block of the current image block is used to obtain or generate a prediction block of the current image block or a sub-block of the current image block through a motion compensation process. The motion information here may include reference image information and motion vectors, where the reference image information may include but is not limited to unidirectional / bidirectional prediction information, a reference image list number, and a reference image index corresponding to the reference image list. For inter prediction, a prediction block may be generated from one of reference pictures within one of the reference picture lists. The video decoder 200 may construct a reference image list, that is, a list 0 and a list 1, based on the reference images stored in the DPB 207. The reference frame index of the current image may be included in one or more of the reference frame list 0 and list 1. In some examples, the video encoder 100 may signal whether to use a new inter prediction mode to decode a specific syntax element of a specific block, or may be a signal to indicate whether to use a new inter prediction mode. And indicating which new inter prediction mode is used to decode a specific syntax element of a specific block. It should be understood that the inter predictor 210 here performs a motion compensation process.
反量化器204将在码流中提供且由熵解码器203解码的经量化变换系数逆量化,即去量化。逆量化过程可包括:使用由视频编码器100针对视频条带中的每个图像块计算的量化参数来确定应施加的量化程度以及同样地确定应施加的逆量化程度。反变换器205将逆变换应用于变换系数,例如逆DCT、逆整数变换或概念上类似的逆变换过程,以便产生像素域中的残差块。The inverse quantizer 204 inverse quantizes, that is, dequantizes, the quantized transform coefficients provided in the code stream and decoded by the entropy decoder 203. The inverse quantization process may include using a quantization parameter calculated by the video encoder 100 for each image block in the video slice to determine the degree of quantization that should be applied and similarly to determine the degree of inverse quantization that should be applied. The inverse transformer 205 applies an inverse transform to transform coefficients, such as an inverse DCT, an inverse integer transform, or a conceptually similar inverse transform process to generate a residual block in the pixel domain.
在帧间预测器210产生用于当前图像块或当前图像块的子块的预测块之后,视频解码器200通过将来自反变换器205的残差块与由帧间预测器210产生的对应预测块求和以得到重建的块,即经解码图像块。求和器211表示执行此求和操作的组件。在需要时,还可使用环路滤波器(在解码环路中或在解码环路之后)来使像素转变平滑或者以其它方式改进视频质量。滤波器单元206可以表示一或多个环路滤波器,例如去块滤波器、自适应环路滤波器(ALF)以及样本自适应偏移(SAO)滤波器。尽管在图2B中将滤波器单元206示出为环路内滤波器,但在其它实现方式中,可将滤波器单元206实施为环路后滤波器。在一种示例下,滤波器单元206适用于重建块以减小块失真,并且该结果作为经解码视频流输出。并且,还可以将给定帧或图像中的经解码图像块存储在经解码图像缓冲器207中,经解码图像缓冲器207存储用于后续运动补偿的参考图像。经解码图像缓冲器207可为存储器的一部分,其还可以存储经解码视频,以供稍后在显示装置(例如图1的显示装置220)上呈现,或可与此类存储器分开。After the inter predictor 210 generates a prediction block for the current image block or a subblock of the current image block, the video decoder 200 works by comparing the residual block from the inverse transformer 205 with the corresponding prediction generated by the inter predictor 210 The blocks are summed to get the reconstructed block, that is, the decoded image block. The summer 211 represents a component that performs this summing operation. When needed, a loop filter (in or after the decoding loop) can also be used to smooth pixel transitions or otherwise improve video quality. The filter unit 206 may represent one or more loop filters, such as a deblocking filter, an adaptive loop filter (ALF), and a sample adaptive offset (SAO) filter. Although the filter unit 206 is shown as an in-loop filter in FIG. 2B, in other implementations, the filter unit 206 may be implemented as a post-loop filter. In one example, the filter unit 206 is adapted to reconstruct a block to reduce block distortion, and the result is output as a decoded video stream. And, a decoded image block in a given frame or image may also be stored in a decoded image buffer 207, and the decoded image buffer 207 stores a reference image for subsequent motion compensation. The decoded image buffer 207 may be part of a memory, which may also store the decoded video for later presentation on a display device, such as the display device 220 of FIG. 1, or may be separate from such memory.
应当理解的是,视频解码器200的其它结构变化可用于解码经编码视频码流。例如,视频解码器200可以不经滤波器单元206处理而生成输出视频流;或者,对于某些图像块或者图像帧,视频解码器200的熵解码器203没有解码出经量化的系数,相应地不需要经反量化器204和反变换器205处理。It should be understood that other structural changes of the video decoder 200 may be used to decode the encoded video code stream. For example, the video decoder 200 may generate an output video stream without being processed by the filter unit 206; or, for certain image blocks or image frames, the entropy decoder 203 of the video decoder 200 does not decode the quantized coefficients, and accordingly, It does not need to be processed by the inverse quantizer 204 and the inverse transformer 205.
如前文所注明,本申请的技术示例性地涉及帧间解码。应理解,本申请的技术可通过本申请中所描述的视频解码器中的任一者进行,视频解码器包含(例如)如关于图1到3所展示及描述的视频编码器100及视频解码器200。即,在一种可行的实施方式中,关于图2所描述的帧间预测器110可在视频数据的块的编码期间在执行帧间预测时执行下文中所描述的特定技术。在另一可行的实施方式中,关于图3所描述的帧间预测器210可在视频数据的 块的解码期间在执行帧间预测时执行下文中所描述的特定技术。因此,对一般性“视频编码器”或“视频解码器”的引用可包含视频编码器100、视频解码器200或另一视频编码或编码单元。As noted previously, the techniques of this application exemplarily involve inter-frame decoding. It should be understood that the techniques of this application may be performed by any of the video decoders described in this application. The video decoder includes, for example, video encoder 100 and video decoding as shown and described with respect to FIGS.器 200。 200. That is, in one feasible implementation, the inter predictor 110 described with respect to FIG. 2 may perform specific techniques described below when performing inter prediction during encoding of a block of video data. In another possible implementation, the inter predictor 210 described with respect to FIG. 3 may perform specific techniques described below when performing inter prediction during decoding of a block of video data. Thus, a reference to a generic "video encoder" or "video decoder" may include video encoder 100, video decoder 200, or another video encoding or coding unit.
图4为本申请实施例中帧间预测模块的一种示意性框图。帧间预测模块121,示例性的,可以包括运动估计单元42和运动补偿单元44。在不同的视频压缩编解码标准中,PU和CU的关系各有不同。帧间预测模块121可根据多个分割模式将当前CU分割为PU。举例来说,帧间预测模块121可根据2N×2N、2N×N、N×2N和N×N分割模式将当前CU分割为PU。在其他实施例中,当前CU即为当前PU,不作限定。FIG. 4 is a schematic block diagram of an inter prediction module according to an embodiment of the present application. The inter prediction module 121, for example, may include a motion estimation unit 42 and a motion compensation unit 44. The relationship between PU and CU is different in different video compression codecs. The inter prediction module 121 may partition a current CU into a PU according to a plurality of partitioning modes. For example, the inter prediction module 121 may partition a current CU into a PU according to 2N × 2N, 2N × N, N × 2N, and N × N partition modes. In other embodiments, the current CU is the current PU, which is not limited.
帧间预测模块121可对PU中的每一者执行整数运动估计(Integer Motion Estimation,IME)且接着执行分数运动估计(Fraction Motion Estimation,FME)。当帧间预测模块121对PU执行IME时,帧间预测模块121可在一个或多个参考图像中搜索用于PU的参考块。在找到用于PU的参考块之后,帧间预测模块121可产生以整数精度指示PU与用于PU的参考块之间的空间位移的运动矢量。当帧间预测模块121对PU执行FME时,帧间预测模块121可改进通过对PU执行IME而产生的运动矢量。通过对PU执行FME而产生的运动矢量可具有子整数精度(例如,1/2像素精度、1/4像素精度等)。在产生用于PU的运动矢量之后,帧间预测模块121可使用用于PU的运动矢量以产生用于PU的预测性图像块。The inter prediction module 121 may perform integer motion estimation (IME) and then perform fractional motion estimation (FME) on each of the PUs. When the inter prediction module 121 performs IME on a PU, the inter prediction module 121 may search a reference block for a PU in one or more reference images. After the reference block for the PU is found, the inter prediction module 121 may generate a motion vector indicating the spatial displacement between the PU and the reference block for the PU with integer precision. When the inter prediction module 121 performs FME on the PU, the inter prediction module 121 may improve a motion vector generated by performing IME on the PU. A motion vector generated by performing FME on a PU may have sub-integer precision (eg, 1/2 pixel precision, 1/4 pixel precision, etc.). After generating a motion vector for the PU, the inter prediction module 121 may use the motion vector for the PU to generate a predictive image block for the PU.
在帧间预测模块121使用AMVP模式用信号通知解码端PU的运动信息的一些可行的实施方式中,帧间预测模块121可产生用于PU的候选预测运动矢量列表。候选预测运动矢量列表可包括一个或多个原始候选预测运动矢量和从原始候选预测运动矢量导出的一个或多个额外候选预测运动矢量。在产生用于PU的候选预测运动矢量列表之后,帧间预测模块121可从候选预测运动矢量列表选择候选预测运动矢量且产生用于PU的运动矢量差(MVD)。用于PU的MVD可指示由选定候选预测运动矢量指示的运动矢量与使用IME和FME针对PU产生的运动矢量之间的差。在这些可行的实施方式中,帧间预测模块121可输出识别选定候选预测运动矢量在候选预测运动矢量列表中的位置的候选预测运动矢量索引。帧间预测模块121还可输出PU的MVD。下文详细描述图6中,本申请实施例中高级运动矢量预测(AMVP)模式的一种可行的实施方式。In some feasible implementations of the inter prediction module 121 using the AMVP mode to signal the motion information of the decoding end PU, the inter prediction module 121 may generate a list of candidate prediction motion vectors for the PU. The candidate prediction motion vector list may include one or more original candidate prediction motion vectors and one or more additional candidate prediction motion vectors derived from the original candidate prediction motion vectors. After generating the candidate prediction motion vector list for the PU, the inter prediction module 121 may select the candidate prediction motion vector from the candidate prediction motion vector list and generate a motion vector difference (MVD) for the PU. The MVD for a PU may indicate a difference between a motion vector indicated by a selected candidate prediction motion vector and a motion vector generated for the PU using IME and FME. In these feasible implementations, the inter prediction module 121 may output a candidate prediction motion vector index that identifies the position of the selected candidate prediction motion vector in the candidate prediction motion vector list. The inter prediction module 121 may also output the MVD of the PU. A detailed implementation of the advanced motion vector prediction (AMVP) mode in the embodiment of the present application in FIG. 6 is described in detail below.
除了通过对PU执行IME和FME来产生用于PU的运动信息外,帧间预测模块121还可对PU中的每一者执行合并(Merge)操作。当帧间预测模块121对PU执行合并操作时,帧间预测模块121可产生用于PU的候选预测运动矢量列表。用于PU的候选预测运动矢量列表可包括一个或多个原始候选预测运动矢量和从原始候选预测运动矢量导出的一个或多个额外候选预测运动矢量。候选预测运动矢量列表中的原始候选预测运动矢量可包括一个或多个空间候选预测运动矢量和时间候选预测运动矢量。空间候选预测运动矢量可指示当前图像中的其它PU的运动信息。时间候选预测运动矢量可基于不同于当前图像的对应的PU的运动信息。时间候选预测运动矢量还可称作时间运动矢量预测(TMVP)。In addition to generating motion information for the PU by performing IME and FME on the PU, the inter prediction module 121 may also perform a merge operation on each of the PUs. When the inter prediction module 121 performs a merge operation on the PU, the inter prediction module 121 may generate a list of candidate prediction motion vectors for the PU. The candidate prediction motion vector list for the PU may include one or more original candidate prediction motion vectors and one or more additional candidate prediction motion vectors derived from the original candidate prediction motion vectors. The original candidate prediction motion vector in the candidate prediction motion vector list may include one or more spatial candidate prediction motion vectors and temporal candidate prediction motion vectors. The spatial candidate prediction motion vector may indicate motion information of other PUs in the current image. The temporal candidate prediction motion vector may be based on motion information of a corresponding PU different from the current picture. The temporal candidate prediction motion vector may also be referred to as temporal motion vector prediction (TMVP).
在产生候选预测运动矢量列表之后,帧间预测模块121可从候选预测运动矢量列表选择候选预测运动矢量中的一个。帧间预测模块121可接着基于由PU的运动信息指示的参考块产生用于PU的预测性图像块。在合并模式中,PU的运动信息可与由选定候选预测运动矢量指示的运动信息相同。下文描述的图5说明Merge示例性的流程图。After generating the candidate prediction motion vector list, the inter prediction module 121 may select one of the candidate prediction motion vectors from the candidate prediction motion vector list. The inter prediction module 121 may then generate a predictive image block for the PU based on the reference block indicated by the motion information of the PU. In the merge mode, the motion information of the PU may be the same as the motion information indicated by the selected candidate prediction motion vector. Figure 5 described below illustrates an exemplary flowchart for Merge.
在基于IME和FME产生用于PU的预测性图像块和基于合并操作产生用于PU的预测性图像块之后,帧间预测模块121可选择通过FME操作产生的预测性图像块或者通过合并操 作产生的预测性图像块。在一些可行的实施方式中,帧间预测模块121可基于通过FME操作产生的预测性图像块和通过合并操作产生的预测性图像块的码率-失真代价分析来选择用于PU的预测性图像块。After generating a predictive image block for a PU based on IME and FME and a predictive image block for a PU based on a merge operation, the inter prediction module 121 may select a predictive image block generated through the FME operation or a merge operation. Predictive image blocks. In some feasible implementations, the inter prediction module 121 may select a predictive image for a PU based on a code rate-distortion cost analysis of the predictive image block generated by the FME operation and the predictive image block generated by the merge operation. Piece.
在帧间预测模块121已选择通过根据分割模式中的每一者分割当前CU而产生的PU的预测性图像块之后(在一些实施方式中,编码树单元CTU划分为CU后,不会再进一步划分为更小的PU,此时PU等同于CU),帧间预测模块121可选择用于当前CU的分割模式。在一些实施方式中,帧间预测模块121可基于通过根据分割模式中的每一者分割当前CU而产生的PU的选定预测性图像块的码率-失真代价分析来选择用于当前CU的分割模式。帧间预测模块121可将与属于选定分割模式的PU相关联的预测性图像块输出到残差产生模块102。帧间预测模块121可将指示属于选定分割模式的PU的运动信息的语法元素输出到熵编码模块116。After the inter prediction module 121 has selected a predictive image block of the PU generated by dividing the current CU according to each of the partitioning modes (in some embodiments, the coding tree unit CTU is divided into CUs, no further (It is divided into smaller PUs. At this time, the PU is equivalent to the CU.) The inter prediction module 121 may select a partitioning mode for the current CU. In some embodiments, the inter prediction module 121 may select a rate-distortion cost analysis for a selected predictive image block of the PU generated by segmenting the current CU according to each of the partitioning modes to select the Split mode. The inter prediction module 121 may output a predictive image block associated with a PU belonging to the selected partition mode to the residual generation module 102. The inter prediction module 121 may output a syntax element indicating motion information of a PU belonging to the selected partitioning mode to the entropy encoding module 116.
在图4的示意图中,帧间预测模块121包括IME模块180A到180N(统称为“IME模块180”)、FME模块182A到182N(统称为“FME模块182”)、合并模块184A到184N(统称为“合并模块184”)、PU模式决策模块186A到186N(统称为“PU模式决策模块186”)和CU模式决策模块188(也可以包括执行从CTU到CU的模式决策过程)。In the schematic diagram of FIG. 4, the inter prediction module 121 includes IME modules 180A to 180N (collectively referred to as “IME module 180”), FME modules 182A to 182N (collectively referred to as “FME module 182”), and merge modules 184A to 184N (collectively referred to as Are "merging module 184"), PU mode decision modules 186A to 186N (collectively referred to as "PU mode decision module 186") and CU mode decision module 188 (which may also include performing a mode decision process from CTU to CU).
IME模块180、FME模块182和合并模块184可对当前CU的PU执行IME操作、FME操作和合并操作。图4的示意图中将帧间预测模块121说明为包括用于CU的每一分割模式的每一PU的单独IME模块180、FME模块182和合并模块184。在其它可行的实施方式中,帧间预测模块121不包括用于CU的每一分割模式的每一PU的单独IME模块180、FME模块182和合并模块184。The IME module 180, the FME module 182, and the merge module 184 may perform an IME operation, an FME operation, and a merge operation on a PU of the current CU. The inter prediction module 121 is illustrated in the schematic diagram of FIG. 4 as including a separate IME module 180, an FME module 182, and a merging module 184 for each PU of each partitioning mode of the CU. In other feasible implementations, the inter prediction module 121 does not include a separate IME module 180, an FME module 182, and a merge module 184 for each PU of each partitioning mode of the CU.
如图4的示意图中所说明,IME模块180A、FME模块182A和合并模块184A可对通过根据2N×2N分割模式分割CU而产生的PU执行IME操作、FME操作和合并操作。PU模式决策模块186A可选择由IME模块180A、FME模块182A和合并模块184A产生的预测性图像块中的一者。As illustrated in the schematic diagram of FIG. 4, the IME module 180A, the FME module 182A, and the merge module 184A may perform IME operations, FME operations, and merge operations on a PU generated by dividing a CU according to a 2N × 2N split mode. The PU mode decision module 186A may select one of the predictive image blocks generated by the IME module 180A, the FME module 182A, and the merge module 184A.
IME模块180B、FME模块182B和合并模块184B可对通过根据N×2N分割模式分割CU而产生的左PU执行IME操作、FME操作和合并操作。PU模式决策模块186B可选择由IME模块180B、FME模块182B和合并模块184B产生的预测性图像块中的一者。The IME module 180B, the FME module 182B, and the merge module 184B may perform an IME operation, an FME operation, and a merge operation on a left PU generated by dividing a CU according to an N × 2N division mode. The PU mode decision module 186B may select one of the predictive image blocks generated by the IME module 180B, the FME module 182B, and the merge module 184B.
IME模块180C、FME模块182C和合并模块184C可对通过根据N×2N分割模式分割CU而产生的右PU执行IME操作、FME操作和合并操作。PU模式决策模块186C可选择由IME模块180C、FME模块182C和合并模块184C产生的预测性图像块中的一者。The IME module 180C, the FME module 182C, and the merge module 184C may perform an IME operation, an FME operation, and a merge operation on a right PU generated by dividing a CU according to an N × 2N division mode. The PU mode decision module 186C may select one of the predictive image blocks generated by the IME module 180C, the FME module 182C, and the merge module 184C.
IME模块180N、FME模块182N和合并模块184可对通过根据N×N分割模式分割CU而产生的右下PU执行IME操作、FME操作和合并操作。PU模式决策模块186N可选择由IME模块180N、FME模块182N和合并模块184N产生的预测性图像块中的一者。The IME module 180N, the FME module 182N, and the merge module 184 may perform an IME operation, an FME operation, and a merge operation on a lower right PU generated by dividing a CU according to an N × N division mode. The PU mode decision module 186N may select one of the predictive image blocks generated by the IME module 180N, the FME module 182N, and the merge module 184N.
PU模式决策模块186可基于多个可能预测性图像块的码率-失真代价分析选择预测性图像块,且选择针对给定解码情形提供最佳码率-失真代价的预测性图像块。示例性的,对于带宽受限的应用,PU模式决策模块186可偏向选择增加压缩比的预测性图像块,而对于其它应用,PU模式决策模块186可偏向选择增加经重建视频质量的预测性图像块。在PU模式决策模块186选择用于当前CU的PU的预测性图像块之后,CU模式决策模块188选择用于当前CU的分割模式且输出属于选定分割模式的PU的预测性图像块和运动信息。The PU mode decision module 186 may select a predictive image block based on a code rate-distortion cost analysis of a plurality of possible predictive image blocks, and select a predictive image block that provides the best code rate-distortion cost for a given decoding situation. For example, for bandwidth-constrained applications, the PU mode decision module 186 may prefer to select predictive image blocks that increase the compression ratio, while for other applications, the PU mode decision module 186 may prefer to select predictive images that increase the quality of the reconstructed video. Piece. After the PU mode decision module 186 selects a predictive image block for the PU of the current CU, the CU mode decision module 188 selects a partition mode for the current CU and outputs the predictive image block and motion information of the PU belonging to the selected partition mode. .
图5为本申请实施例中合并模式的一种示例性流程图。视频编码器(例如视频编码器100)可执行合并操作200。在其它可行的实施方式中,视频编码器可执行不同于合并操作200的合并操作。举例来说,在其它可行的实施方式中,视频编码器可执行合并操作,其中视频编码器执行比合并操作200多、少的步骤或与合并操作200不同的步骤。在其它可行的实施方式中,视频编码器可以不同次序或并行地执行合并操作200的步骤。编码器还可对以跳跃(skip)模式编码的PU执行合并操作200。FIG. 5 is an exemplary flowchart of a merge mode in an embodiment of the present application. A video encoder (eg, video encoder 100) may perform a merge operation 200. In other feasible implementations, the video encoder may perform a merge operation different from the merge operation 200. For example, in other feasible implementations, the video encoder may perform a merge operation, where the video encoder performs more or fewer steps than the merge operation 200 or steps different from the merge operation 200. In other possible implementations, the video encoder may perform the steps of the merge operation 200 in a different order or in parallel. The encoder may also perform a merge operation 200 on a PU encoded in a skip mode.
在视频编码器开始合并操作200之后,视频编码器可产生用于当前PU的候选预测运动矢量列表(202)。视频编码器可以各种方式产生用于当前PU的候选预测运动矢量列表。举例来说,视频编码器可根据下文关于图8到图12描述的实例技术中的一者产生用于当前PU的候选预测运动矢量列表。After the video encoder starts the merge operation 200, the video encoder may generate a list of candidate predicted motion vectors for the current PU (202). The video encoder may generate a list of candidate prediction motion vectors for the current PU in various ways. For example, the video encoder may generate a list of candidate prediction motion vectors for the current PU according to one of the example techniques described below with respect to FIGS. 8-12.
如前文所述,用于当前PU的候选预测运动矢量列表可包括时间候选预测运动矢量。时间候选预测运动矢量可指示时域对应(co-located)的PU的运动信息。co-located的PU可在空间上与当前PU处于图像帧中的同一个位置,但在参考图像而非当前图像中。本申请可将包括时域对应的PU的参考图像称作相关参考图像。本申请可将相关参考图像的参考图像索引称作相关参考图像索引。如前文所描述,当前图像可与一个或多个参考图像列表(例如,列表0、列表1等)相关联。参考图像索引可通过指示在参考图像某一个参考图像列表中的位置来指示参考图像。在一些可行的实施方式中,当前图像可与组合参考图像列表相关联。As mentioned before, the candidate prediction motion vector list for the current PU may include a temporal candidate prediction motion vector. The temporal candidate prediction motion vector may indicate motion information of a co-located PU in the time domain. A co-located PU may be spatially in the same position in the image frame as the current PU, but in a reference picture instead of the current picture. In this application, a reference picture including a PU corresponding to the time domain may be referred to as a related reference picture. A reference image index of a related reference image may be referred to as a related reference image index in this application. As described previously, the current image may be associated with one or more reference image lists (eg, list 0, list 1, etc.). The reference image index may indicate a reference image by indicating a position in a reference image list of the reference image. In some feasible implementations, the current image may be associated with a combined reference image list.
在一些视频编码器中,相关参考图像索引为涵盖与当前PU相关联的参考索引源位置的PU的参考图像索引。在这些视频编码器中,与当前PU相关联的参考索引源位置邻接于当前PU左方或邻接于当前PU上方。在本申请中,如果与PU相关联的图像块包括特定位置,则PU可“涵盖”所述特定位置。在这些视频编码器中,如果参考索引源位置不可用,则视频编码器可使用零的参考图像索引。In some video encoders, the related reference picture index is the reference picture index of the PU covering the reference index source position associated with the current PU. In these video encoders, the reference index source location associated with the current PU is adjacent to the left of the current PU or above the current PU. In this application, if an image block associated with a PU includes a specific location, the PU may "cover" the specific location. In these video encoders, if a reference index source location is not available, the video encoder can use a zero reference image index.
然而,可存在以下例子:与当前PU相关联的参考索引源位置在当前CU内。在这些例子中,如果PU在当前CU上方或左方,则涵盖与当前PU相关联的参考索引源位置的PU可被视为可用。然而,视频编码器可需要存取当前CU的另一PU的运动信息以便确定含有co-located PU的参考图像。因此,这些视频编码器可使用属于当前CU的PU的运动信息(即,参考图像索引)以产生用于当前PU的时间候选预测运动矢量。换句话说,这些视频编码器可使用属于当前CU的PU的运动信息产生时间候选预测运动矢量。因此,视频编码器可能不能并行地产生用于当前PU和涵盖与当前PU相关联的参考索引源位置的PU的候选预测运动矢量列表。However, there may be examples where the reference index source location associated with the current PU is within the current CU. In these examples, if the PU is above or to the left of the current CU, the PU covering the reference index source location associated with the current PU may be considered available. However, the video encoder may need to access motion information of another PU of the current CU in order to determine a reference picture containing a co-located PU. Therefore, these video encoders may use motion information (ie, a reference picture index) of a PU belonging to the current CU to generate a temporal candidate prediction motion vector for the current PU. In other words, these video encoders may use temporal information of a PU belonging to the current CU to generate a temporal candidate prediction motion vector. Therefore, the video encoder may not be able to generate a list of candidate prediction motion vectors for the current PU and the PU covering the reference index source position associated with the current PU in parallel.
根据本申请的技术,视频编码器可在不参考任何其它PU的参考图像索引的情况下显式地设定相关参考图像索引。此可使得视频编码器能够并行地产生用于当前PU和当前CU的其它PU的候选预测运动矢量列表。因为视频编码器显式地设定相关参考图像索引,所以相关参考图像索引不基于当前CU的任何其它PU的运动信息。在视频编码器显式地设定相关参考图像索引的一些可行的实施方式中,视频编码器可始终将相关参考图像索引设定为固定的预定义预设参考图像索引(例如0)。以此方式,视频编码器可基于由预设参考图像索引指示的参考帧中的co-located PU的运动信息产生时间候选预测运动矢量,且可将时间候选预测运动矢量包括于当前CU的候选预测运动矢量列表中。According to the technology of the present application, the video encoder may explicitly set the relevant reference picture index without referring to the reference picture index of any other PU. This may enable the video encoder to generate candidate prediction motion vector lists for the current PU and other PUs of the current CU in parallel. Because the video encoder explicitly sets the relevant reference picture index, the relevant reference picture index is not based on the motion information of any other PU of the current CU. In some feasible implementations where the video encoder explicitly sets the relevant reference picture index, the video encoder may always set the relevant reference picture index to a fixed, predefined preset reference picture index (eg, 0). In this way, the video encoder may generate a temporal candidate prediction motion vector based on the motion information of the co-located PU in the reference frame indicated by the preset reference picture index, and may include the temporal candidate prediction motion vector in the candidate prediction of the current CU List of motion vectors.
在视频编码器显式地设定相关参考图像索引的可行的实施方式中,视频编码器可显式地在语法结构(例如图像标头、条带标头、APS或另一语法结构)中用信号通知相关参考图像索引。在此可行的实施方式中,视频编码器可用信号通知解码端用于每一LCU(即CTU)、CU、PU、TU或其它类型的子块的相关参考图像索引。举例来说,视频编码器可用信号通知:用于CU的每一PU的相关参考图像索引等于“1”。In a feasible implementation where the video encoder explicitly sets the relevant reference picture index, the video encoder may be explicitly used in a syntax structure (e.g., image header, slice header, APS, or another syntax structure) The related reference picture index is signaled. In this feasible implementation manner, the video encoder may signal the decoder to the relevant reference picture index for each LCU (ie, CTU), CU, PU, TU, or other type of sub-block. For example, the video encoder may signal that the relevant reference picture index for each PU of the CU is equal to "1".
在一些可行的实施方式中,相关参考图像索引可经隐式地而非显式地设定。在这些可行的实施方式中,视频编码器可使用由涵盖当前CU外部的位置的PU的参考图像索引指示的参考图像中的PU的运动信息产生用于当前CU的PU的候选预测运动矢量列表中的每一时间候选预测运动矢量,即使这些位置并不严格地邻近当前PU。In some feasible implementations, the relevant reference image index may be set implicitly rather than explicitly. In these feasible implementations, the video encoder may use the motion information of the PU in the reference image indicated by the reference image index of the PU covering the location outside the current CU to generate a candidate prediction motion vector list for the PU of the current CU. Each time candidate predicts a motion vector, even if these locations are not strictly adjacent to the current PU.
在产生用于当前PU的候选预测运动矢量列表之后,视频编码器可产生与候选预测运动矢量列表中的候选预测运动矢量相关联的预测性图像块(204)。视频编码器可通过基于所指示候选预测运动矢量的运动信息确定当前PU的运动信息和接着基于由当前PU的运动信息指示的一个或多个参考块产生预测性图像块来产生与候选预测运动矢量相关联的预测性图像块。视频编码器可接着从候选预测运动矢量列表选择候选预测运动矢量中的一者(206)。视频编码器可以各种方式选择候选预测运动矢量。举例来说,视频编码器可基于对与候选预测运动矢量相关联的预测性图像块的每一者的码率-失真代价分析来选择候选预测运动矢量中的一者。After generating a list of candidate prediction motion vectors for the current PU, the video encoder may generate predictive image blocks associated with the candidate prediction motion vectors in the candidate prediction motion vector list (204). The video encoder may generate the candidate prediction motion vector by determining the motion information of the current PU based on the motion information of the indicated candidate prediction motion vector and then generating a predictive image block based on one or more reference blocks indicated by the motion information of the current PU. Associated predictive image blocks. The video encoder may then select one of the candidate prediction motion vectors from the candidate prediction motion vector list (206). The video encoder can select candidate prediction motion vectors in various ways. For example, a video encoder may select one of the candidate prediction motion vectors based on a code rate-distortion cost analysis of each of the predictive image blocks associated with the candidate prediction motion vector.
在选择候选预测运动矢量之后,视频编码器可输出候选预测运动矢量索引(208)。候选预测运动矢量索引可指示在候选预测运动矢量列表中选定候选预测运动矢量的位置。在一些可行的实施方式中,候选预测运动矢量索引可表示为“merge_idx”。After selecting the candidate prediction motion vector, the video encoder may output a candidate prediction motion vector index (208). The candidate prediction motion vector index may indicate a position where a candidate prediction motion vector is selected in the candidate prediction motion vector list. In some feasible implementations, the candidate prediction motion vector index may be represented as "merge_idx".
图6为本申请实施例中高级运动矢量预测(AMVP)模式的一种示例性流程图。视频编码器(例如视频编码器100)可执行AMVP操作210。FIG. 6 is an exemplary flowchart of an advanced motion vector prediction (AMVP) mode in an embodiment of the present application. A video encoder (eg, video encoder 100) may perform AMVP operation 210.
在视频编码器开始AMVP操作210之后,视频编码器可产生用于当前PU的一个或多个运动矢量(211)。视频编码器可执行整数运动估计和分数运动估计以产生用于当前PU的运动矢量。如前文所描述,当前图像可与两个参考图像列表(列表0和列表1)相关联。如果当前PU经单向预测,则视频编码器可产生用于当前PU的列表0运动矢量或列表1运动矢量。列表0运动矢量可指示当前PU的图像块与列表0中的参考图像中的参考块之间的空间位移。列表1运动矢量可指示当前PU的图像块与列表1中的参考图像中的参考块之间的空间位移。如果当前PU经双向预测,则视频编码器可产生用于当前PU的列表0运动矢量和列表1运动矢量。After the video encoder starts AMVP operation 210, the video encoder may generate one or more motion vectors for the current PU (211). The video encoder may perform integer motion estimation and fractional motion estimation to generate motion vectors for the current PU. As described earlier, the current image may be associated with two reference image lists (List 0 and List 1). If the current PU is unidirectionally predicted, the video encoder may generate a list 0 motion vector or a list 1 motion vector for the current PU. The list 0 motion vector may indicate a spatial displacement between an image block of the current PU and a reference block in a reference image in list 0. The list 1 motion vector may indicate a spatial displacement between an image block of the current PU and a reference block in a reference image in list 1. If the current PU is bi-predicted, the video encoder may generate a list 0 motion vector and a list 1 motion vector for the current PU.
在产生用于当前PU的一个或多个运动矢量之后,视频编码器可产生用于当前PU的预测性图像块(212)。视频编码器可基于由用于当前PU的一个或多个运动矢量指示的一个或多个参考块产生用于当前PU的预测性图像块。After generating one or more motion vectors for the current PU, the video encoder may generate predictive image blocks for the current PU (212). The video encoder may generate predictive image blocks for the current PU based on one or more reference blocks indicated by one or more motion vectors for the current PU.
另外,视频编码器可产生用于当前PU的候选预测运动矢量列表(213)。视频解码器可以各种方式产生用于当前PU的候选预测运动矢量列表。举例来说,视频编码器可根据下文关于图8到图12描述的可行的实施方式中的一个或多个产生用于当前PU的候选预测运动矢量列表。在一些可行的实施方式中,当视频编码器在AMVP操作210中产生候选预测运动矢量列表时,候选预测运动矢量列表可限于两个候选预测运动矢量。相比而言,当视频编码器在合并操作中产生候选预测运动矢量列表时,候选预测运动矢量列表可包括更多候选预测运动矢量(例如,五个候选预测运动矢量)。In addition, the video encoder may generate a list of candidate predicted motion vectors for the current PU (213). The video decoder may generate a list of candidate prediction motion vectors for the current PU in various ways. For example, the video encoder may generate a list of candidate prediction motion vectors for the current PU according to one or more of the possible implementations described below with respect to FIGS. 8 to 12. In some feasible implementations, when the video encoder generates a list of candidate prediction motion vectors in the AMVP operation 210, the list of candidate prediction motion vectors may be limited to two candidate prediction motion vectors. In contrast, when a video encoder generates a list of candidate prediction motion vectors in a merge operation, the list of candidate prediction motion vectors may include more candidate prediction motion vectors (eg, five candidate prediction motion vectors).
在产生用于当前PU的候选预测运动矢量列表之后,视频编码器可产生用于候选预测运动矢量列表中的每一候选预测运动矢量的一个或多个运动矢量差(MVD)(214)。视频编码器可通过确定由候选预测运动矢量指示的运动矢量与当前PU的对应运动矢量之间的差来产生用于候选预测运动矢量的运动矢量差。After generating a list of candidate prediction motion vectors for the current PU, the video encoder may generate one or more motion vector differences (MVD) for each candidate prediction motion vector in the list of candidate prediction motion vectors (214). The video encoder may generate a motion vector difference for the candidate prediction motion vector by determining a difference between the motion vector indicated by the candidate prediction motion vector and a corresponding motion vector of the current PU.
如果当前PU经单向预测,则视频编码器可产生用于每一候选预测运动矢量的单一MVD。如果当前PU经双向预测,则视频编码器可产生用于每一候选预测运动矢量的两个MVD。第一MVD可指示候选预测运动矢量的运动矢量与当前PU的列表0运动矢量之间的差。第二MVD可指示候选预测运动矢量的运动矢量与当前PU的列表1运动矢量之间的差。If the current PU is unidirectionally predicted, the video encoder may generate a single MVD for each candidate prediction motion vector. If the current PU is bi-predicted, the video encoder may generate two MVDs for each candidate prediction motion vector. The first MVD may indicate a difference between the motion vector of the candidate prediction motion vector and the list 0 motion vector of the current PU. The second MVD may indicate a difference between the motion vector of the candidate prediction motion vector and the list 1 motion vector of the current PU.
视频编码器可从候选预测运动矢量列表选择候选预测运动矢量中的一个或多个(215)。视频编码器可以各种方式选择一个或多个候选预测运动矢量。举例来说,视频编码器可选择具有最小误差地匹配待编码的运动矢量的相关联运动矢量的候选预测运动矢量,此可减少表示用于候选预测运动矢量的运动矢量差所需的位数目。The video encoder may select one or more of the candidate prediction motion vectors from the candidate prediction motion vector list (215). The video encoder may select one or more candidate prediction motion vectors in various ways. For example, a video encoder may select a candidate prediction motion vector with an associated motion vector that matches the motion vector to be encoded with minimal error, which may reduce the number of bits required to represent the motion vector difference for the candidate prediction motion vector.
在选择一个或多个候选预测运动矢量之后,视频编码器可输出用于当前PU的一个或多个参考图像索引、一个或多个候选预测运动矢量索引,和用于一个或多个选定候选预测运动矢量的一个或多个运动矢量差(216)。After selecting one or more candidate prediction motion vectors, the video encoder may output one or more reference image indexes for the current PU, one or more candidate prediction motion vector indexes, and one or more selected candidate motion vectors. One or more motion vector differences of the predicted motion vector (216).
在当前图像与两个参考图像列表(列表0和列表1)相关联且当前PU经单向预测的例子中,视频编码器可输出用于列表0的参考图像索引(“ref_idx_10”)或用于列表1的参考图像索引(“ref_idx_11”)。视频编码器还可输出指示用于当前PU的列表0运动矢量的选定候选预测运动矢量在候选预测运动矢量列表中的位置的候选预测运动矢量索引(“mvp_10_flag”)。或者,视频编码器可输出指示用于当前PU的列表1运动矢量的选定候选预测运动矢量在候选预测运动矢量列表中的位置的候选预测运动矢量索引(“mvp_11_flag”)。视频编码器还可输出用于当前PU的列表0运动矢量或列表1运动矢量的MVD。In examples where the current picture is associated with two reference picture lists (List 0 and List 1) and the current PU is unidirectionally predicted, the video encoder may output a reference picture index ("ref_idx_10") for List 0 or for Reference image index of list 1 ("ref_idx_11"). The video encoder may also output a candidate prediction motion vector index ("mvp_10_flag") indicating the position of the selected candidate prediction motion vector for the list 0 motion vector of the current PU in the candidate prediction motion vector list. Alternatively, the video encoder may output a candidate prediction motion vector index ("mvp_11_flag") indicating the position of the selected candidate prediction motion vector for the list 1 motion vector of the current PU in the candidate prediction motion vector list. The video encoder may also output a list 0 motion vector or a list 1 motion vector MVD for the current PU.
在当前图像与两个参考图像列表(列表0和列表1)相关联且当前PU经双向预测的例子中,视频编码器可输出用于列表0的参考图像索引(“ref_idx_10”)和用于列表1的参考图像索引(“ref_idx_11”)。视频编码器还可输出指示用于当前PU的列表0运动矢量的选定候选预测运动矢量在候选预测运动矢量列表中的位置的候选预测运动矢量索引(“mvp_10_flag”)。另外,视频编码器可输出指示用于当前PU的列表1运动矢量的选定候选预测运动矢量在候选预测运动矢量列表中的位置的候选预测运动矢量索引(“mvp_11_flag”)。视频编码器还可输出用于当前PU的列表0运动矢量的MVD和用于当前PU的列表1运动矢量的MVD。In the example where the current picture is associated with two reference picture lists (List 0 and List 1) and the current PU is bi-predicted, the video encoder may output the reference picture index ("ref_idx_10") for List 0 and the list Reference image index of 1 ("ref_idx_11"). The video encoder may also output a candidate prediction motion vector index ("mvp_10_flag") indicating the position of the selected candidate prediction motion vector for the list 0 motion vector of the current PU in the candidate prediction motion vector list. In addition, the video encoder may output a candidate prediction motion vector index ("mvp_11_flag") indicating the position of the selected candidate prediction motion vector for the list 1 motion vector of the current PU in the candidate prediction motion vector list. The video encoder may also output the MVD of the list 0 motion vector for the current PU and the MVD of the list 1 motion vector for the current PU.
图7为本申请实施例中由视频解码器(例如视频解码器200)执行的运动补偿的一种示例性流程图。FIG. 7 is an exemplary flowchart of motion compensation performed by a video decoder (such as video decoder 200) in an embodiment of the present application.
当视频解码器执行运动补偿操作220时,视频解码器可接收用于当前PU的选定候选预测运动矢量的指示(222)。举例来说,视频解码器可接收指示选定候选预测运动矢量在当前PU的候选预测运动矢量列表内的位置的候选预测运动矢量索引。When the video decoder performs motion compensation operation 220, the video decoder may receive an indication of the selected candidate prediction motion vector for the current PU (222). For example, the video decoder may receive a candidate prediction motion vector index indicating the position of the selected candidate prediction motion vector within the candidate prediction motion vector list of the current PU.
如果当前PU的运动信息是使用AMVP模式进行编码且当前PU经双向预测,则视频解码器可接收第一候选预测运动矢量索引和第二候选预测运动矢量索引。第一候选预测运动矢量索引指示用于当前PU的列表0运动矢量的选定候选预测运动矢量在候选预测运动矢量列表中的位置。第二候选预测运动矢量索引指示用于当前PU的列表1运动矢量的选定 候选预测运动矢量在候选预测运动矢量列表中的位置。在一些可行的实施方式中,单一语法元素可用以识别两个候选预测运动矢量索引。If the motion information of the current PU is encoded using the AMVP mode and the current PU is bidirectionally predicted, the video decoder may receive the first candidate prediction motion vector index and the second candidate prediction motion vector index. The first candidate prediction motion vector index indicates the position of the selected candidate prediction motion vector for the list 0 motion vector of the current PU in the candidate prediction motion vector list. The second candidate prediction motion vector index indicates the position of the selected candidate prediction motion vector for the list 1 motion vector of the current PU in the candidate prediction motion vector list. In some feasible implementations, a single syntax element may be used to identify two candidate prediction motion vector indexes.
另外,视频解码器可产生用于当前PU的候选预测运动矢量列表(224)。视频解码器可以各种方式产生用于当前PU的此候选预测运动矢量列表。举例来说,视频解码器可使用下文参看图8到图12描述的技术来产生用于当前PU的候选预测运动矢量列表。当视频解码器产生用于候选预测运动矢量列表的时间候选预测运动矢量时,视频解码器可显式地或隐式地设定识别包括co-located PU的参考图像的参考图像索引,如前文关于图5所描述。In addition, the video decoder may generate a list of candidate predicted motion vectors for the current PU (224). The video decoder may generate this candidate prediction motion vector list for the current PU in various ways. For example, the video decoder may use the techniques described below with reference to FIGS. 8 to 12 to generate a list of candidate prediction motion vectors for the current PU. When the video decoder generates a temporal candidate prediction motion vector for a candidate prediction motion vector list, the video decoder may explicitly or implicitly set a reference image index identifying a reference image including a co-located PU, as described above Figure 5 describes this.
在产生用于当前PU的候选预测运动矢量列表之后,视频解码器可基于由用于当前PU的候选预测运动矢量列表中的一个或多个选定候选预测运动矢量指示的运动信息确定当前PU的运动信息(225)。举例来说,如果当前PU的运动信息是使用合并模式而编码,则当前PU的运动信息可与由选定候选预测运动矢量指示的运动信息相同。如果当前PU的运动信息是使用AMVP模式而编码,则视频解码器可使用由所述或所述选定候选预测运动矢量指示的一个或多个运动矢量和码流中指示的一个或多个MVD来重建当前PU的一个或多个运动矢量。当前PU的参考图像索引和预测方向标识可与所述一个或多个选定候选预测运动矢量的参考图像索引和预测方向标识相同。在确定当前PU的运动信息之后,视频解码器可基于由当前PU的运动信息指示的一个或多个参考块产生用于当前PU的预测性图像块(226)。After generating the candidate prediction motion vector list for the current PU, the video decoder may determine the current PU's based on the motion information indicated by one or more selected candidate prediction motion vectors in the candidate prediction motion vector list for the current PU. Motion information (225). For example, if the motion information of the current PU is encoded using a merge mode, the motion information of the current PU may be the same as the motion information indicated by the selected candidate prediction motion vector. If the motion information of the current PU is encoded using the AMVP mode, the video decoder may use one or more MVDs indicated in the one or more motion vectors and the code stream indicated by the or the selected candidate prediction motion vector. To reconstruct one or more motion vectors of the current PU. The reference image index and prediction direction identifier of the current PU may be the same as the reference image index and prediction direction identifier of the one or more selected candidate prediction motion vectors. After determining the motion information of the current PU, the video decoder may generate a predictive image block for the current PU based on one or more reference blocks indicated by the motion information of the current PU (226).
图8为本申请实施例中编码单元(CU)及与其关联的相邻位置图像块的一种示例性示意图,说明CU250和与CU250相关联的示意性的候选预测运动矢量位置252A到252E的示意图。本申请可将候选预测运动矢量位置252A到252E统称为候选预测运动矢量位置252。候选预测运动矢量位置252表示与CU250在同一图像中的空间候选预测运动矢量。候选预测运动矢量位置252A定位于CU250左方。候选预测运动矢量位置252B定位于CU250上方。候选预测运动矢量位置252C定位于CU250右上方。候选预测运动矢量位置252D定位于CU250左下方。候选预测运动矢量位置252E定位于CU250左上方。图8为用以提供帧间预测模块121和运动补偿模块162可产生候选预测运动矢量列表的方式的示意性实施方式。下文将参考帧间预测模块121解释实施方式,但应理解运动补偿模块162可实施相同技术,且因此产生相同候选预测运动矢量列表。FIG. 8 is an exemplary schematic diagram of a coding unit (CU) and an adjacent position image block associated with the coding unit (CU) in the embodiment of the present application, illustrating CU250 and schematic candidate prediction motion vector positions 252A to 252E associated with CU250 . This application may collectively refer to the candidate prediction motion vector positions 252A to 252E as the candidate prediction motion vector positions 252. The candidate prediction motion vector position 252 indicates a spatial candidate prediction motion vector in the same image as the CU 250. The candidate prediction motion vector position 252A is positioned to the left of CU250. The candidate prediction motion vector position 252B is positioned above the CU250. The candidate prediction motion vector position 252C is positioned at the upper right of CU250. The candidate prediction motion vector position 252D is positioned at the lower left of CU250. The candidate prediction motion vector position 252E is positioned at the upper left of the CU250. FIG. 8 is a schematic embodiment of a manner for providing an inter prediction module 121 and a motion compensation module 162 to generate a list of candidate prediction motion vectors. The embodiments will be explained below with reference to the inter prediction module 121, but it should be understood that the motion compensation module 162 may implement the same technique and thus generate the same candidate prediction motion vector list.
图9为本申请实施例中构建候选预测运动矢量列表的一种示例性流程图。将参考包括五个候选预测运动矢量的列表描述图9的技术,但本文中所描述的技术还可与具有其它大小的列表一起使用。五个候选预测运动矢量可各自具有索引(例如,0到4)。将参考一般视频解码器描述图9的技术。一般视频解码器示例性的可以为视频编码器(例如视频编码器100)或视频解码器(例如视频解码器200)。FIG. 9 is an exemplary flowchart of constructing a candidate prediction motion vector list in an embodiment of the present application. The technique of FIG. 9 will be described with reference to a list including five candidate prediction motion vectors, but the techniques described herein may also be used with lists of other sizes. The five candidate prediction motion vectors may each have an index (eg, 0 to 4). The technique of FIG. 9 will be described with reference to a general video decoder. A general video decoder may be, for example, a video encoder (eg, video encoder 100) or a video decoder (eg, video decoder 200).
为了根据图9的实施方式重建候选预测运动矢量列表,视频解码器首先考虑四个空间候选预测运动矢量(902)。四个空间候选预测运动矢量可以包括候选预测运动矢量位置252A、252B、252C和252D。四个空间候选预测运动矢量对应于与当前CU(例如,CU250)在同一图像中的四个PU的运动信息。视频解码器可以特定次序考虑列表中的四个空间候选预测运动矢量。举例来说,候选预测运动矢量位置252A可被第一个考虑。如果候选预测运动矢量位置252A可用,则候选预测运动矢量位置252A可指派到索引0。如果候选预测运动矢量位置252A不可用,则视频解码器可不将候选预测运动矢量位置252A包括于候选预测运动矢量列表中。候选预测运动矢量位置可出于各种理由而不可用。举例来说,如 果候选预测运动矢量位置不在当前图像内,则候选预测运动矢量位置可能不可用。在另一可行的实施方式中,如果候选预测运动矢量位置经帧内预测,则候选预测运动矢量位置可能不可用。在另一可行的实施方式中,如果候选预测运动矢量位置在与当前CU不同的条带中,则候选预测运动矢量位置可能不可用。To reconstruct a list of candidate prediction motion vectors according to the embodiment of FIG. 9, the video decoder first considers four spatial candidate prediction motion vectors (902). The four spatial candidate prediction motion vectors may include candidate prediction motion vector positions 252A, 252B, 252C, and 252D. The four spatial candidate prediction motion vectors correspond to motion information of four PUs in the same image as the current CU (for example, CU250). The video decoder may consider the four spatial candidate prediction motion vectors in the list in a particular order. For example, the candidate prediction motion vector position 252A may be considered first. If the candidate prediction motion vector position 252A is available, the candidate prediction motion vector position 252A may be assigned to index 0. If the candidate prediction motion vector position 252A is not available, the video decoder may not include the candidate prediction motion vector position 252A in the candidate prediction motion vector list. Candidate prediction motion vector positions may be unavailable for various reasons. For example, if the candidate prediction motion vector position is not within the current image, the candidate prediction motion vector position may not be available. In another feasible implementation, if the candidate prediction motion vector position is intra-predicted, the candidate prediction motion vector position may not be available. In another feasible implementation, if the candidate prediction motion vector position is in a slice different from the current CU, the candidate prediction motion vector position may not be available.
在考虑候选预测运动矢量位置252A之后,视频解码器可接下来考虑候选预测运动矢量位置252B。如果候选预测运动矢量位置252B可用且不同于候选预测运动矢量位置252A,则视频解码器可将候选预测运动矢量位置252B添加到候选预测运动矢量列表。在此特定上下文中,术语“相同”和“不同”指代与候选预测运动矢量位置相关联的运动信息。因此,如果两个候选预测运动矢量位置具有相同运动信息则被视为相同,且如果其具有不同运动信息则被视为不同。如果候选预测运动矢量位置252A不可用,则视频解码器可将候选预测运动矢量位置252B指派到索引0。如果候选预测运动矢量位置252A可用,则视频解码器可将候选预测运动矢量位置252指派到索引1。如果候选预测运动矢量位置252B不可用或与候选预测运动矢量位置252A相同,则视频解码器跳过候选预测运动矢量位置252B且不将其包括于候选预测运动矢量列表中。After considering the candidate prediction motion vector position 252A, the video decoder may next consider the candidate prediction motion vector position 252B. If the candidate prediction motion vector position 252B is available and different from the candidate prediction motion vector position 252A, the video decoder may add the candidate prediction motion vector position 252B to the candidate prediction motion vector list. In this particular context, the terms "same" and "different" refer to motion information associated with candidate predicted motion vector locations. Therefore, two candidate prediction motion vector positions are considered the same if they have the same motion information, and are considered different if they have different motion information. If the candidate prediction motion vector position 252A is not available, the video decoder may assign the candidate prediction motion vector position 252B to index 0. If the candidate prediction motion vector position 252A is available, the video decoder may assign the candidate prediction motion vector position 252 to index 1. If the candidate prediction motion vector position 252B is not available or the same as the candidate prediction motion vector position 252A, the video decoder skips the candidate prediction motion vector position 252B and does not include it in the candidate prediction motion vector list.
候选预测运动矢量位置252C由视频解码器类似地考虑以供包括于列表中。如果候选预测运动矢量位置252C可用且不与候选预测运动矢量位置252B和252A相同,则视频解码器将候选预测运动矢量位置252C指派到下一可用索引。如果候选预测运动矢量位置252C不可用或并非不同于候选预测运动矢量位置252A和252B中的至少一者,则视频解码器不将候选预测运动矢量位置252C包括于候选预测运动矢量列表中。接下来,视频解码器考虑候选预测运动矢量位置252D。如果候选预测运动矢量位置252D可用且不与候选预测运动矢量位置252A、252B和252C相同,则视频解码器将候选预测运动矢量位置252D指派到下一可用索引。如果候选预测运动矢量位置252D不可用或并非不同于候选预测运动矢量位置252A、252B和252C中的至少一者,则视频解码器不将候选预测运动矢量位置252D包括于候选预测运动矢量列表中。以上实施方式大体上描述示例性地考虑候选预测运动矢量252A到252D以供包括于候选预测运动矢量列表中,但在一些实施方施中,可首先将所有候选预测运动矢量252A到252D添加到候选预测运动矢量列表,稍后从候选预测运动矢量列表移除重复。The candidate prediction motion vector position 252C is similarly considered by the video decoder for inclusion in the list. If the candidate prediction motion vector position 252C is available and not the same as the candidate prediction motion vector positions 252B and 252A, the video decoder assigns the candidate prediction motion vector position 252C to the next available index. If the candidate prediction motion vector position 252C is unavailable or different from at least one of the candidate prediction motion vector positions 252A and 252B, the video decoder does not include the candidate prediction motion vector position 252C in the candidate prediction motion vector list. Next, the video decoder considers the candidate prediction motion vector position 252D. If the candidate prediction motion vector position 252D is available and is not the same as the candidate prediction motion vector position 252A, 252B, and 252C, the video decoder assigns the candidate prediction motion vector position 252D to the next available index. If the candidate prediction motion vector position 252D is unavailable or different from at least one of the candidate prediction motion vector positions 252A, 252B, and 252C, the video decoder does not include the candidate prediction motion vector position 252D in the candidate prediction motion vector list. The above embodiments generally describe exemplarily considering candidate prediction motion vectors 252A to 252D for inclusion in the candidate prediction motion vector list, but in some embodiments, all candidate prediction motion vectors 252A to 252D may be first added to the candidate A list of predicted motion vectors, with duplicates removed from the list of candidate predicted motion vectors later.
在视频解码器考虑前四个空间候选预测运动矢量之后,候选预测运动矢量列表可能包括四个空间候选预测运动矢量或者该列表可能包括少于四个空间候选预测运动矢量。如果列表包括四个空间候选预测运动矢量(904,是),则视频解码器考虑时间候选预测运动矢量(906)。时间候选预测运动矢量可对应于不同于当前图像的图像的co-located PU的运动信息。如果时间候选预测运动矢量可用且不同于前四个空间候选预测运动矢量,则视频解码器将时间候选预测运动矢量指派到索引4。如果时间候选预测运动矢量不可用或与前四个空间候选预测运动矢量中的一者相同,则视频解码器不将所述时间候选预测运动矢量包括于候选预测运动矢量列表中。因此,在视频解码器考虑时间候选预测运动矢量(906)之后,候选预测运动矢量列表可能包括五个候选预测运动矢量(框902处考虑的前四个空间候选预测运动矢量和框904处考虑的时间候选预测运动矢量)或可能包括四个候选预测运动矢量(框902处考虑的前四个空间候选预测运动矢量)。如果候选预测运动矢量列表包括五个候选预测运动矢量(908,是),则视频解码器完成构建列表。After the video decoder considers the first four spatial candidate prediction motion vectors, the candidate prediction motion vector list may include four spatial candidate prediction motion vectors or the list may include less than four spatial candidate prediction motion vectors. If the list includes four spatial candidate prediction motion vectors (904, Yes), the video decoder considers temporal candidate prediction motion vectors (906). The temporal candidate prediction motion vector may correspond to motion information of a co-located PU of a picture different from the current picture. If a temporal candidate prediction motion vector is available and different from the first four spatial candidate prediction motion vectors, the video decoder assigns the temporal candidate prediction motion vector to index 4. If the temporal candidate prediction motion vector is not available or is the same as one of the first four spatial candidate prediction motion vectors, the video decoder does not include the temporal candidate prediction motion vector in the candidate prediction motion vector list. Therefore, after the video decoder considers temporal candidate prediction motion vectors (906), the candidate prediction motion vector list may include five candidate prediction motion vectors (the first four spatial candidate prediction motion vectors considered at block 902 and the The temporal candidate prediction motion vector) or may include four candidate prediction motion vectors (the first four spatial candidate prediction motion vectors considered at block 902). If the candidate prediction motion vector list includes five candidate prediction motion vectors (908, Yes), the video decoder completes building the list.
如果候选预测运动矢量列表包括四个候选预测运动矢量(908,否),则视频解码器可考虑第五空间候选预测运动矢量(910)。第五空间候选预测运动矢量可(例如)对应于候选预测运动矢量位置252E。如果位置252E处的候选预测运动矢量可用且不同于位置252A、252B、252C和252D处的候选预测运动矢量,则视频解码器可将第五空间候选预测运动矢量添加到候选预测运动矢量列表,第五空间候选预测运动矢量经指派到索引4。如果位置252E处的候选预测运动矢量不可用或并非不同于候选预测运动矢量位置252A、252B、252C和252D处的候选预测运动矢量,则视频解码器可不将位置252处的候选预测运动矢量包括于候选预测运动矢量列表中。因此在考虑第五空间候选预测运动矢量(910)之后,列表可能包括五个候选预测运动矢量(框902处考虑的前四个空间候选预测运动矢量和框910处考虑的第五空间候选预测运动矢量)或可能包括四个候选预测运动矢量(框902处考虑的前四个空间候选预测运动矢量)。If the candidate prediction motion vector list includes four candidate prediction motion vectors (908, No), the video decoder may consider the fifth spatial candidate prediction motion vector (910). The fifth spatial candidate prediction motion vector may, for example, correspond to the candidate prediction motion vector position 252E. If the candidate prediction motion vector at position 252E is available and different from the candidate prediction motion vectors at positions 252A, 252B, 252C, and 252D, the video decoder may add a fifth spatial candidate prediction motion vector to the candidate prediction motion vector list. The five-space candidate prediction motion vector is assigned to index 4. If the candidate prediction motion vector at position 252E is unavailable or different from the candidate prediction motion vector at candidate position 252A, 252B, 252C, and 252D, the video decoder may not include the candidate prediction motion vector at position 252 in Candidate prediction motion vector list. So after considering the fifth spatial candidate prediction motion vector (910), the list may include five candidate prediction motion vectors (the first four spatial candidate prediction motion vectors considered at block 902 and the fifth spatial candidate prediction motion considered at block 910) Vector) or may include four candidate prediction motion vectors (the first four spatial candidate prediction motion vectors considered at block 902).
如果候选预测运动矢量列表包括五个候选预测运动矢量(912,是),则视频解码器完成产生候选预测运动矢量列表。如果候选预测运动矢量列表包括四个候选预测运动矢量(912,否),则视频解码器添加人工产生的候选预测运动矢量(914)直到列表包括五个候选预测运动矢量(916,是)为止。If the candidate prediction motion vector list includes five candidate prediction motion vectors (912, Yes), the video decoder finishes generating the candidate prediction motion vector list. If the candidate prediction motion vector list includes four candidate prediction motion vectors (912, No), the video decoder adds artificially generated candidate prediction motion vectors (914) until the list includes five candidate prediction motion vectors (916, Yes).
如果在视频解码器考虑前四个空间候选预测运动矢量之后,列表包括少于四个空间候选预测运动矢量(904,否),则视频解码器可考虑第五空间候选预测运动矢量(918)。第五空间候选预测运动矢量可(例如)对应于候选预测运动矢量位置252E。如果位置252E处的候选预测运动矢量可用且不同于已包括于候选预测运动矢量列表中的候选预测运动矢量,则视频解码器可将第五空间候选预测运动矢量添加到候选预测运动矢量列表,第五空间候选预测运动矢量经指派到下一可用索引。如果位置252E处的候选预测运动矢量不可用或并非不同于已包括于候选预测运动矢量列表中的候选预测运动矢量中的一者,则视频解码器可不将位置252E处的候选预测运动矢量包括于候选预测运动矢量列表中。视频解码器可接着考虑时间候选预测运动矢量(920)。如果时间候选预测运动矢量可用且不同于已包括于候选预测运动矢量列表中的候选预测运动矢量,则视频解码器可将所述时间候选预测运动矢量添加到候选预测运动矢量列表,所述时间候选预测运动矢量经指派到下一可用索引。如果时间候选预测运动矢量不可用或并非不同于已包括于候选预测运动矢量列表中的候选预测运动矢量中的一者,则视频解码器可不将所述时间候选预测运动矢量包括于候选预测运动矢量列表中。If the list includes fewer than four spatial candidate prediction motion vectors after the video decoder considers the first four spatial candidate prediction motion vectors (904, No), the video decoder may consider the fifth spatial candidate prediction motion vector (918). The fifth spatial candidate prediction motion vector may, for example, correspond to the candidate prediction motion vector position 252E. If the candidate prediction motion vector at position 252E is available and different from the candidate prediction motion vectors already included in the candidate prediction motion vector list, the video decoder may add a fifth spatial candidate prediction motion vector to the candidate prediction motion vector list, the The five-space candidate prediction motion vector is assigned to the next available index. If the candidate prediction motion vector at position 252E is unavailable or different from one of the candidate prediction motion vectors already included in the candidate prediction motion vector list, the video decoder may not include the candidate prediction motion vector at position 252E in Candidate prediction motion vector list. The video decoder may then consider the temporal candidate prediction motion vector (920). If a temporal candidate prediction motion vector is available and different from the candidate prediction motion vectors already included in the candidate prediction motion vector list, the video decoder may add the temporal candidate prediction motion vector to the candidate prediction motion vector list, the temporal candidate The predicted motion vector is assigned to the next available index. If the temporal candidate prediction motion vector is not available or is not different from one of the candidate prediction motion vectors already included in the candidate prediction motion vector list, the video decoder may not include the temporal candidate prediction motion vector in the candidate prediction motion vector. List.
如果在考虑第五空间候选预测运动矢量(框918)和时间候选预测运动矢量(框920)之后,候选预测运动矢量列表包括五个候选预测运动矢量(922,是),则视频解码器完成产生候选预测运动矢量列表。如果候选预测运动矢量列表包括少于五个候选预测运动矢量(922,否),则视频解码器添加人工产生的候选预测运动矢量(914)直到列表包括五个候选预测运动矢量(916,是)为止。If, after considering the fifth spatial candidate prediction motion vector (block 918) and the temporal candidate prediction motion vector (block 920), the candidate prediction motion vector list includes five candidate prediction motion vectors (922, Yes), the video decoder finishes generating List of candidate prediction motion vectors. If the list of candidate prediction motion vectors includes less than five candidate prediction motion vectors (922, No), the video decoder adds artificially generated candidate prediction motion vectors (914) until the list includes five candidate prediction motion vectors (916, Yes) until.
根据本申请的技术,可在空间候选预测运动矢量和时间候选预测运动矢量之后人工产生额外合并候选预测运动矢量以使合并候选预测运动矢量列表的大小固定为合并候选预测运动矢量的指定数目(例如前文图9的可行的实施方式中的五个)。额外合并候选预测运动矢量可包括示例性的经组合双向预测性合并候选预测运动矢量(候选预测运动矢量1)、经缩放双向预测性合并候选预测运动矢量(候选预测运动矢量2),和零向量Merge/AMVP候选预测运动矢量(候选预测运动矢量3)。According to the technology of the present application, an additional merge candidate prediction motion vector may be artificially generated after the spatial candidate prediction motion vector and the temporal candidate prediction motion vector to fix the size of the merge candidate prediction motion vector list to a specified number of merge candidate prediction motion vectors (for example (Five of the previous possible implementations of FIG. 9). Additional merge candidate prediction motion vectors may include exemplary combined bi-predictive merge candidate prediction motion vectors (candidate prediction motion vector 1), scaled bi-directional predictive merge candidate prediction motion vectors (candidate prediction motion vector 2), and zero vectors Merge / AMVP candidate prediction motion vector (candidate prediction motion vector 3).
图10为本申请实施例中将经过组合的候选运动矢量添加到合并模式候选预测运动矢量列表的一种示例性示意图。经组合双向预测性合并候选预测运动矢量可通过组合原始合并候选预测运动矢量而产生。具体来说,原始候选预测运动矢量中的两个候选预测运动矢量(其具有mvL0和refIdxL0或mvL1和refIdxL1)可用以产生双向预测性合并候选预测运动矢量。在图10中,两个候选预测运动矢量包括于原始合并候选预测运动矢量列表中。一候选预测运动矢量的预测类型为列表0单向预测,且另一候选预测运动矢量的预测类型为列表1单向预测。在此可行的实施方式中,mvL0_A和ref0是从列表0拾取,且mvL1_B和ref0是从列表1拾取,且接着可产生双向预测性合并候选预测运动矢量(其具有列表0中的mvL0_A和ref0以及列表1中的mvL1_B和ref0)并检查其是否不同于已包括于候选预测运动矢量列表中的候选预测运动矢量。如果其不同,则视频解码器可将双向预测性合并候选预测运动矢量包括于候选预测运动矢量列表中。FIG. 10 is an exemplary schematic diagram of adding a combined candidate motion vector to a merge mode candidate prediction motion vector list in an embodiment of the present application. The combined bi-directional predictive merge candidate prediction motion vector may be generated by combining the original merge candidate prediction motion vector. Specifically, two candidate prediction motion vectors (which have mvL0 and refIdxL0 or mvL1 and refIdxL1) among the original candidate prediction motion vectors may be used to generate a bidirectional predictive merge candidate prediction motion vector. In FIG. 10, two candidate prediction motion vectors are included in the original merge candidate prediction motion vector list. The prediction type of one candidate prediction motion vector is List 0 unidirectional prediction, and the prediction type of the other candidate prediction motion vector is List 1 unidirectional prediction. In this feasible implementation, mvL0_A and ref0 are picked from list 0, and mvL1_B and ref0 are picked from list 1, and then a bidirectional predictive merge candidate prediction motion vector (which has mvL0_A and ref0 in list 0 and MvL1_B and ref0) in Listing 1 and check whether it is different from the candidate prediction motion vectors that have been included in the candidate prediction motion vector list. If it is different, the video decoder may include the bi-directional predictive merge candidate prediction motion vector in the candidate prediction motion vector list.
图11为本申请实施例中将经过缩放的候选运动矢量添加到合并模式候选预测运动矢量列表的一种示例性示意图。经缩放双向预测性合并候选预测运动矢量可通过缩放原始合并候选预测运动矢量而产生。具体来说,来自原始候选预测运动矢量的一候选预测运动矢量(其可具有mvLX和refIdxLX)可用以产生双向预测性合并候选预测运动矢量。在图11的可行的实施方式中,两个候选预测运动矢量包括于原始合并候选预测运动矢量列表中。一候选预测运动矢量的预测类型为列表0单向预测,且另一候选预测运动矢量的预测类型为列表1单向预测。在此可行的实施方式中,mvL0_A和ref0可从列表0拾取,且ref0可复制到列表1中的参考索引ref0′。接着,可通过缩放具有ref0和ref0′的mvL0_A而计算mvL0′_A。缩放可取决于POC距离。接着,可产生双向预测性合并候选预测运动矢量(其具有列表0中的mvL0_A和ref0以及列表1中的mvL0′_A和ref0′)并检查其是否为重复的。如果其并非重复的,则可将其添加到合并候选预测运动矢量列表。FIG. 11 is an exemplary schematic diagram of adding a scaled candidate motion vector to a merge mode candidate prediction motion vector list in an embodiment of the present application. The scaled bi-directional predictive merge candidate prediction motion vector may be generated by scaling the original merge candidate prediction motion vector. Specifically, a candidate prediction motion vector (which may have mvLX and refIdxLX) from the original candidate prediction motion vector may be used to generate a bidirectional predictive merge candidate prediction motion vector. In the feasible implementation of FIG. 11, two candidate prediction motion vectors are included in the original merge candidate prediction motion vector list. The prediction type of one candidate prediction motion vector is List 0 unidirectional prediction, and the prediction type of the other candidate prediction motion vector is List 1 unidirectional prediction. In this feasible implementation, mvL0_A and ref0 may be picked from list 0, and ref0 may be copied to the reference index ref0 ′ in list 1. Then, mvL0′_A may be calculated by scaling mvL0_A with ref0 and ref0 ′. The scaling may depend on the POC distance. Next, a bi-directional predictive merge candidate prediction motion vector (which has mvL0_A and ref0 in list 0 and mvL0'_A and ref0 'in list 1) can be generated and checked if it is a duplicate. If it is not duplicate, it can be added to the merge candidate prediction motion vector list.
图12为本申请实施例中将零运动矢量添加到合并模式候选预测运动矢量列表的一种示例性示意图。零向量合并候选预测运动矢量可通过组合零向量与可经参考的参考索引而产生。如果零向量候选预测运动矢量并非重复的,则可将其添加到合并候选预测运动矢量列表。对于每一产生的合并候选预测运动矢量,运动信息可与列表中的前一候选预测运动矢量的运动信息比较。FIG. 12 is an exemplary schematic diagram of adding a zero motion vector to a merge mode candidate prediction motion vector list in an embodiment of the present application. The zero vector merge candidate prediction motion vector may be generated by combining the zero vector with a reference index that can be referred to. If the zero vector candidate prediction motion vector is not duplicated, it can be added to the merge candidate prediction motion vector list. For each generated merge candidate prediction motion vector, the motion information may be compared with the motion information of the previous candidate prediction motion vector in the list.
在一种可行的实施方式中,如果新产生的候选预测运动矢量不同于已包括于候选预测运动矢量列表中的候选预测运动矢量,则将所产生的候选预测运动矢量添加到合并候选预测运动矢量列表。确定候选预测运动矢量是否不同于已包括于候选预测运动矢量列表中的候选预测运动矢量的过程有时称作修剪(pruning)。通过修剪,每一新产生的候选预测运动矢量可与列表中的现有候选预测运动矢量比较。在一些可行的实施方式中,修剪操作可包括比较一个或多个新候选预测运动矢量与已在候选预测运动矢量列表中的候选预测运动矢量和不添加为已在候选预测运动矢量列表中的候选预测运动矢量的重复的新候选预测运动矢量。在另一些可行的实施方式中,修剪操作可包括将一个或多个新候选预测运动矢量添加到候选预测运动矢量列表且稍后从所述列表移除重复候选预测运动矢量。应理解,在另一些可行的实施方式中,可以不进行上述修剪的步骤。In a feasible implementation manner, if the newly generated candidate prediction motion vector is different from the candidate prediction motion vector already included in the candidate prediction motion vector list, the generated candidate prediction motion vector is added to the merge candidate prediction motion vector. List. The process of determining whether the candidate prediction motion vector is different from the candidate prediction motion vector already included in the candidate prediction motion vector list is sometimes referred to as pruning. With pruning, each newly generated candidate prediction motion vector can be compared with existing candidate prediction motion vectors in the list. In some feasible implementations, the pruning operation may include comparing one or more new candidate prediction motion vectors with candidate prediction motion vectors already in the candidate prediction motion vector list and not adding as candidates already in the candidate prediction motion vector list. Repeated new candidate prediction motion vector for prediction motion vector. In other feasible implementations, the pruning operation may include adding one or more new candidate prediction motion vectors to a list of candidate prediction motion vectors and removing duplicate candidate prediction motion vectors from the list later. It should be understood that, in other feasible implementation manners, the foregoing trimming step may not be performed.
在上述图5-7、图9-12等各种可行的实施方式中,空间候选预测模式示例性的来自图8所示的252A至252E的五个位置,即与待处理图像块邻接的位置。在上述图5-7、图9-12等各种可行的实施方式的基础上,在一些可行的实施方式中,空间候选预测模式示例性的还 可以包括与待处理图像块相距预设距离以内,但不与待处理图像块邻接的位置。示例性的,该类位置可以如图13中的252F至252J所示。应理解,图13为本申请实施例中编码单元及与其关联的相邻位置图像块的一种示例性示意图。与所待处理图像块处于同一图像帧且处理所述待处理图像块时已完成重建的不与所述待处理图像块相邻的图像块所述的位置均在此类位置的范围内。In the various feasible implementation manners shown in FIG. 5-7, FIG. 9-12, and the like, the spatial candidate prediction mode is exemplified from five positions 252A to 252E shown in FIG. . On the basis of the various feasible implementation manners shown in FIG. 5-7, FIG. 9-12, and the like, in some feasible implementation manners, the spatial candidate prediction mode may further include, for example, within a preset distance from the image block to be processed. , But not adjacent to the image block to be processed. Exemplarily, such positions may be shown as 252F to 252J in FIG. 13. It should be understood that FIG. 13 is an exemplary schematic diagram of a coding unit and an adjacent position image block associated with the coding unit in the embodiment of the present application. The positions described in the image blocks that are in the same image frame as the image block to be processed and that have been reconstructed when the image block to be processed is not adjacent to the image block to be processed are within the range of such positions.
图14为本申请实施例中运动信息预测方法的一种示例性流程图,具体的,包括以下步骤:FIG. 14 is an exemplary flowchart of a method for predicting motion information according to an embodiment of the present application. Specifically, the method includes the following steps:
S1401、获取与待处理图像块具有预设位置关系的至少两个目标像素点;S1401. Obtain at least two target pixel points having a preset positional relationship with an image block to be processed;
其中,所述目标像素点包括与所述待处理图像块邻接的第一候选像素点和位于所述待处理图像块的左侧且与所述待处理图像块不邻接的第二候选像素点。The target pixel point includes a first candidate pixel point adjacent to the image block to be processed and a second candidate pixel point located on the left side of the image block to be processed and not adjacent to the image block to be processed.
图8示意性地给出了编码单元250的邻接位置252A-252E,图13示意性地给出了编码端元250的不邻接位置252F-252J。应理解,上述位置既可以用于指示覆盖该位置的图像块,又可以用于指示处于该位置处的像素点。还应理解,本申请实施例中的待处理图像块为一待处理的像素集合,并不限定于编码单元、编码子单元或者预测单元等,对应的,编码单元250可以作为本申请实施例中的待处理图像块。FIG. 8 schematically shows the adjacent positions 252A-252E of the coding unit 250, and FIG. 13 schematically shows the non-adjacent positions 252F-252J of the coding end element 250. It should be understood that the foregoing position may be used to indicate both an image block covering the position and a pixel point at the position. It should also be understood that the image block to be processed in the embodiment of the present application is a set of pixels to be processed, and is not limited to a coding unit, a coding subunit, or a prediction unit. Correspondingly, the coding unit 250 may be used as an embodiment in the present application. Of pending image blocks.
待处理图像块的左侧包括了待处理图像块的正左侧(比如252A所对应的位置),也可以包括左上方(比如252D所对应的位置),也可以包括左下方(比如252E所对应的位置)。The left side of the to-be-processed image block includes the left side of the to-be-processed image block (such as the position corresponding to 252A), it can also include the upper left (such as the position corresponding to 252D), and it can also include the lower left (such as the corresponding position of 252E) s position).
在一种可行的实施方式中,如图15所示,不妨建立以所述待处理图像块的左上顶点处的像素点的位置为原点,以所述待处理图像块的上边缘所在的直线为横轴,向右为水平正方向,以所述待处理图像块的左边缘所在的直线为纵轴,向下为竖直正方向的直角坐标系。In a feasible implementation manner, as shown in FIG. 15, it may be established that the position of the pixel point at the upper left vertex of the image block to be processed is set as the origin, and the straight line where the upper edge of the image block to be processed is located is The horizontal axis, the right direction is the horizontal positive direction, the straight line where the left edge of the image block to be processed is located is the vertical axis, and the vertical coordinate system is the vertical positive direction.
本发明实施例中第二候选像素点的位置可以是该坐标系中以下坐标点中的至少一个:(-1,h×i-1+h),(-1,h×i+h),(-w×i,h×j-1),(-w×i-1,h×j-1),(-w×i,h×j),(-w×i-1,h×j),其中,w和h为预设正整数,i为正整数,j为非负整数。The position of the second candidate pixel point in the embodiment of the present invention may be at least one of the following coordinate points in the coordinate system: (-1, h × i-1 + h), (-1, h × i + h), (-w × i, h × j-1), (-w × i-1, h × j-1), (-w × i, h × j), (-w × i-1, h × j ), Where w and h are preset positive integers, i is a positive integer, and j is a non-negative integer.
在一种可行的实施方式中,第二候选像素点的位置可以不包括待处理图像块左上方的位置,即第二候选像素点的位置可以是该坐标系中以下坐标点中的至少一个:(-1,h×i-1+h),(-1,h×i+h),(-w×i,h×j+h-1),(-w×i-1,h×j+h-1),(-w×i,h×j),(-w×i-1,h×j),其中,w和h为预设正整数,i为正整数,j为非负整数。In a feasible implementation manner, the position of the second candidate pixel point may not include the upper left position of the image block to be processed, that is, the position of the second candidate pixel point may be at least one of the following coordinate points in the coordinate system: (-1, h × i-1 + h), (-1, h × i + h), (-w × i, h × j + h-1), (-w × i-1, h × j + h-1), (-w × i, h × j), (-w × i-1, h × j), where w and h are preset positive integers, i is a positive integer, and j is non-negative Integer.
在一种可行的实施方式中,w为所述待处理图像块的宽度,h为所述待处理图像块的高度。In a feasible implementation manner, w is the width of the image block to be processed, and h is the height of the image block to be processed.
在另一种可行的实施方式中,图像块的运动信息被确定以后,会存储在一个运动矢量矩阵中,以便处理后续图像块时使用。比如,可以将整帧图像对应为以4x4像素点集合为像素单元的像素单元集合,每个4x4像素点集合对应一个运动信息,则提取每个4x4像素点对应的运动信息,就可以构成一个和原图像对应的运动信息矩阵,该运动信息矩阵也可以称为运动矢量场。上述过程在本申请实施例中被称为运动矢量场通过对所述待处理图像块所在的图像对应的运动信息矩阵进行采样获得,w为所述运动矢量场的采样宽度间隔,h为所述运动矢量场的采样高度间隔。应理解,在该实施方式中,w的确定和待处理图像块的宽无关,h的确定和待处理图像块的高无关。In another feasible implementation manner, after the motion information of the image block is determined, it is stored in a motion vector matrix for use in processing subsequent image blocks. For example, the entire frame of image can be corresponding to a pixel unit set with a 4x4 pixel point set as a pixel unit, and each 4x4 pixel point set corresponds to a motion information, and then the motion information corresponding to each 4x4 pixel point can be extracted to form a sum The motion information matrix corresponding to the original image. The motion information matrix can also be called a motion vector field. The above process is referred to as a motion vector field in the embodiment of the present application by sampling the motion information matrix corresponding to the image where the image block to be processed is located, where w is the sampling width interval of the motion vector field, and h is the The sampling height interval of the motion vector field. It should be understood that, in this embodiment, the determination of w is independent of the width of the image block to be processed, and the determination of h is independent of the height of the image block to be processed.
在一种可行的实施方式中,该步骤包括:确定于待处理图像块的在先已重构图像块的编码单元中位于所述待处理图像块的左侧且与所述待处理图像块不邻接的像素点。In a feasible implementation manner, this step includes: determining a coding unit of a previously reconstructed image block of the image block to be processed, which is located on the left side of the image block to be processed and is different from the image block to be processed Adjacent pixels.
具体的,不妨以所述待处理图像块的左下角位置的像素点作为参考点,以所述待处理图像块的下边缘所在直线为参考直线。确定位于所述参考点左侧的一个或多个定位点,所述定位点位于所述参考直线上。确定所述定位点所在的编码单元(或者预测单元),将位于所述编码单元的左上角点的上方的邻接点和所述编码单元的右上角点的上方的邻接点中的至少一个作为目标像素点。Specifically, the pixel point at the lower left corner of the image block to be processed may be used as a reference point, and the straight line at the lower edge of the image block to be processed is used as a reference straight line. Determining one or more anchor points located to the left of the reference point, the anchor points being located on the reference straight line. Determine the coding unit (or prediction unit) where the anchor point is located, and target at least one of the adjacent point above the upper left corner point of the coding unit and the adjacent point above the upper right corner point of the coding unit as a target pixel.
按照预设的步长依次确定与所述参考直线平行的多个衍生直线,所述衍生直线位于所述待处理图像块的下方。以所述衍生直线为新的参考直线,以所述衍生直线和所述待处理图像块左边缘所在直线的交点为新的参考点,重复确定目标像素点的步骤,以获得至少一个新的目标像素点。A plurality of derived straight lines parallel to the reference straight line are sequentially determined according to a preset step size, and the derived straight lines are located below the image block to be processed. Taking the derived straight line as a new reference straight line, and taking the intersection of the derived straight line and the straight line at the left edge of the image block to be processed as a new reference point, repeating the steps of determining a target pixel point to obtain at least one new target pixel.
应理解,当根据新的参考点确定的编码单元和根据之前的参考点确定的编码单元是同一个编码单元时,不再重复获得目标像素点。It should be understood that when the coding unit determined according to the new reference point and the coding unit determined according to the previous reference point are the same coding unit, the target pixel point is no longer obtained repeatedly.
还应理解,本实施方式确定了与待处理图像块具有预设位置关系的至少两个目标像素点的位置,而获取的顺序依据下面对预设顺序的描述。It should also be understood that this embodiment determines the positions of at least two target pixel points having a preset position relationship with the image block to be processed, and the order of acquisition is based on the following description of the preset order.
还应理解,在一种可行的实施方式中,为了不增加新的线状存储空间(line buffer),当上述实施例中的目标像素点的位置位于所述待处理图像块的上方,且距离待处理图像块的上边缘所在直线间隔1个像素点及以上时,舍弃该目标像素点的位置。It should also be understood that, in a feasible implementation manner, in order not to add a new line storage space, when the position of the target pixel point in the above embodiment is located above the image block to be processed, and the distance is When the top edge of the to-be-processed image block is located at a distance of 1 pixel or more from the straight line, the position of the target pixel is discarded.
在一种可行的实施方式中,为了节省运动信息的存储空间,需要对第二候选像素点的位置范围进行限定,比如,限定在上述坐标系中,第二候选像素点的横坐标不能超过一个边界值,即w×i小于或等于第一阈值。具体的,所述第一阈值等于所述待处理图像块所在的编码树单元CTU的宽度,或者,所述第一阈值等于所述CTU的宽度的2倍。In a feasible implementation manner, in order to save the storage space of the motion information, the position range of the second candidate pixel point needs to be limited. For example, limited to the above coordinate system, the abscissa of the second candidate pixel point cannot exceed one The boundary value, that is, w × i is less than or equal to the first threshold. Specifically, the first threshold is equal to a width of a coding tree unit CTU in which the image block to be processed is located, or the first threshold is equal to twice the width of the CTU.
应理解,上述直角坐标系仅为了更清楚地描述第二候选像素点的位置,在实际实施例中,不存在建立直角坐标系的步骤。还应理解,为了更方便地描述第二候选像素点的位置,还可以建立以其他像素点位置为原点,其他直线为坐标轴的各种坐标系,不做限定。It should be understood that the above-mentioned rectangular coordinate system is only for more clearly describing the position of the second candidate pixel point. In an actual embodiment, there is no step of establishing a rectangular coordinate system. It should also be understood that in order to more conveniently describe the position of the second candidate pixel point, various coordinate systems can be established with the positions of other pixel points as the origin and other straight lines as the coordinate axes, without limitation.
本申请实施例对与所述待处理图像块邻接的第一候选像素点的位置不做限定,示例性的,可以为图8中252A-252E所指示的任意一个或多个位置上的点。The embodiment of the present application does not limit the position of the first candidate pixel point adjacent to the image block to be processed. For example, it may be a point at any one or more positions indicated by 252A-252E in FIG. 8.
在一种可行的实施方式中,所述第二候选像素点为多个,所述获取与待处理图像块具有预设位置关系的至少两个目标像素点,包括:按照所述预设顺序获取所述至少两个目标像素点中的多个第二候选像素点。In a feasible implementation manner, there are a plurality of second candidate pixel points, and the acquiring at least two target pixel points having a preset position relationship with the image block to be processed includes: acquiring in accordance with the preset order A plurality of second candidate pixel points among the at least two target pixel points.
应理解,在对第二候选像素点的索引信息进行变长编码时,获取的顺序和编码索引信息的比特消耗有关系,比如,编码在先获取的第二候选像素点的索引信息的比特数少于或者等于编码在后获取的第二候选像素点的索引信息的比特数,即,当所述在先获取的第二候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为P,当所述在后获取的第二候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为Q,P小于或等于Q。示例性的,二进制表示可以为编码码字,二进制表示的长度即为编码码字的码字长度。本申请实施例中的目标标识信息将在后文中论述。It should be understood that when variable-length encoding is performed on the index information of the second candidate pixel point, the order of acquisition is related to the bit consumption of the encoded index information, for example, the number of bits of the index information of the second candidate pixel point that is previously acquired is encoded. Less than or equal to the number of bits of the index information of the second candidate pixel point that is acquired later, that is, when the second candidate pixel point that was previously acquired corresponds to the target motion information, the binary of the target identification information The length of the representation is P. When the second candidate pixel point obtained later corresponds to the target motion information, the length of the binary representation of the target identification information is Q, and P is less than or equal to Q. Exemplarily, the binary representation may be an encoded codeword, and the length of the binary representation is the codeword length of the encoded codeword. The target identification information in the embodiments of the present application will be discussed later.
在一种可行的实施方式中,所述预设顺序包括:从短到长的距离顺序,其中,所述距离为所述第二候选像素点在所述直角坐标系中的水平坐标绝对值和竖直坐标绝对值之和;或者,从右到左的顺序;或者,从上到下的顺序;或者,从右上到左下的折线型顺序。In a feasible implementation manner, the preset order includes a short-to-long distance order, where the distance is an absolute value of a horizontal coordinate of the second candidate pixel point in the rectangular coordinate system and The sum of the absolute values of the vertical coordinates; or, the order from right to left; or, the order from top to bottom; or the polyline order from top to bottom.
在一种可行的实施方式中,所述距离为连接所述第二候选像素点和所述待处理图像块的左下角顶点位置的像素点的直线线段的长度。In a feasible implementation manner, the distance is a length of a straight line segment connecting the second candidate pixel point and a pixel point at a vertex position of a lower left corner of the image block to be processed.
示例性的,图16为从右到左的顺序示意图,按照从上到下的顺序逐行获取,行内按照从右到左的顺序逐个获取,图17为从上到下的顺序示意图,按照从右到左的顺序逐列获取,列内按照从上到下的顺序逐个获取,图18为从右上到左下的顺序示意图,图中的数字代表了获取的顺序,获取的顺序越靠前,数字越小。应理解,当某一个位置上的第二候选像素点不被获取时,该位置被跳过,其它位置,依然按照数字越小,顺序越靠前来获取。By way of example, FIG. 16 is a schematic diagram from right to left, which is obtained row by row from top to bottom, and is obtained one by one from the right to left in the row. FIG. 17 is a schematic diagram from top to bottom, which follows The order from right to left is obtained column by column, and the columns are obtained one by one from top to bottom. Figure 18 is a schematic diagram of the sequence from top right to bottom left. The numbers in the figure represent the order of acquisition. The higher the order of acquisition, the higher the number. The smaller. It should be understood that when the second candidate pixel point at a certain position is not acquired, the position is skipped, and the other positions are still obtained according to the smaller number and the higher the order.
应理解本申请实施例中的预设顺序不仅限于以上三种顺序,例如还可以包括图19-图30所示的顺序,不做限定。It should be understood that the preset order in the embodiments of the present application is not limited to the above three orders, and for example, the order shown in FIG. 19 to FIG. 30 may also be included, which is not limited.
在一种可行的实施方式中,获取的目标像素点的位置或者目标像素点对应的运动信息被送入候选运动信息列表。具体实现方式可以和前述H.265标准技术中Merge或者AMVP模式所用到的候选运动矢量列表的构建方式相同。In a feasible implementation manner, the obtained position of the target pixel point or the motion information corresponding to the target pixel point is sent to a candidate motion information list. The specific implementation manner may be the same as the construction method of the candidate motion vector list used in the Merge or AMVP mode in the aforementioned H.265 standard technology.
在一种可行的实施方式中,和前述H.265标准相同的,要进行修剪的操作,即所述获取与待处理图像块具有预设位置关系的至少两个目标像素点,包括:依次获取与所述待处理图像块具有所述预设位置关系的候选像素点;确定当前获取的所述候选像素点的运动信息与已获取的所述目标像素点的运动信息不同;将所述具有不同运动信息的候选像素点作为所述目标像素点。具体实现方式请参见前文中对修剪操作的说明,不再赘述。In a feasible implementation manner, similar to the foregoing H.265 standard, a trimming operation is performed, that is, the acquiring at least two target pixel points having a preset position relationship with an image block to be processed includes: acquiring in sequence Candidate pixel points having the preset positional relationship with the image block to be processed; determining that the currently acquired motion information of the candidate pixel points is different from the acquired motion information of the target pixel points; The candidate pixel point of the motion information is used as the target pixel point. For a specific implementation method, refer to the description of the pruning operation in the foregoing, and will not be described again.
在一种可行的实施方式中,可以将目标像素点的位置或者目标像素点对应的运动信息直接送入候选运动信息列表,即不进行修剪操作,在此实施方式中,所述获取的至少两个目标像素点中,存在至少两个目标像素点的运动信息相同的情况,也可能存在任意两个目标像素点的运动信息都不相同的情况。In a feasible implementation manner, the position of the target pixel point or the motion information corresponding to the target pixel point may be directly sent to the candidate motion information list, that is, no trimming operation is performed. In this embodiment, the obtained at least two Among the target pixels, there is a case where the motion information of at least two target pixels is the same, and there may also be a case where the motion information of any two target pixels is different.
在一种可行的实施方式中,可以限定所述获取的目标像素点的个数为预设的第二阈值。第二阈值的选取可以根据具体的实现来决定。比如,合理地确定第二阈值以保证获取的目标像素点的个数为定值,或者保证加入到候选运动信息列表中的目标像素点的个数为定值,或者保证候选运动信息列表中的运动信息的总数量为定值。In a feasible implementation manner, the number of the obtained target pixel points may be limited to a preset second threshold. The selection of the second threshold may be determined according to a specific implementation. For example, reasonably determine the second threshold to ensure that the number of target pixel points obtained is a fixed value, or to ensure that the number of target pixel points added to the candidate motion information list is a fixed value, or to ensure that The total amount of motion information is a fixed value.
S1402、获取目标标识信息;S1402. Obtain target identification information.
其中,所述目标标识信息用于从所述至少两个目标像素点对应的运动信息中确定目标运动信息,其中,当所述第一候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为N,当所述第二候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为M,N小于或等于M。The target identification information is used to determine target motion information from the motion information corresponding to the at least two target pixel points, and when the first candidate pixel point corresponds to the target motion information, the target identification information The length of the binary representation of the information is N. When the second candidate pixel corresponds to the target motion information, the length of the binary representation of the target identification information is M, and N is less than or equal to M.
目标标识信息可以是一个索引,用来指明候选运动信息列表中的每个运动信息,不同的运动信息通过不同的索引号进行区分。不同的索引号具有不同的二进制表示,二进制表示可以为编码码字。在本申请实施例中,对应第一候选像素点的索引号的编码码字的长度小于或等于对应第二候选像素点的索引号的编码码字的长度。The target identification information may be an index used to indicate each piece of motion information in the candidate motion information list, and different motion information is distinguished by different index numbers. Different index numbers have different binary representations, and the binary representations can be encoded codewords. In the embodiment of the present application, the length of the encoded codeword corresponding to the index number of the first candidate pixel point is less than or equal to the length of the encoded codeword corresponding to the index number of the second candidate pixel point.
S1403、根据所述目标运动信息,预测所述待处理图像块的运动信息。S1403. Predict motion information of the image block to be processed according to the target motion information.
应理解,相比于H.265标准中的Merge或者AMVP技术,本申请实施例增加了新的候选运动信息,因此,本申请实施例可以用于对于Merge技术或者AMVP技术的改进。It should be understood that, compared with the Merge or AMVP technology in the H.265 standard, the embodiment of the present application adds new candidate motion information. Therefore, the embodiment of the present application can be used to improve the Merge technology or the AMVP technology.
具体的,和Merge技术类似,可以将所述目标运动信息作为所述待处理图像块的运动信息。和AMVP技术类似,可以将所述目标运动信息作为所述待处理图像块的运动信息的预测值。通过和运动信息差值的结合,从而得到所述待处理图像块的运动信息。Specifically, similar to the Merge technology, the target motion information may be used as the motion information of the image block to be processed. Similar to the AMVP technology, the target motion information may be used as a prediction value of the motion information of the image block to be processed. By combining the difference with the motion information, the motion information of the image block to be processed is obtained.
具体的,该方法可以用于解码待处理图像块,所述方法还包括:解析码流以获得目标运动残差信息;对应的,所述根据所述目标运动信息,预测所述待处理图像块的运动信息,包括:组合所述目标运动信息和所述目标运动残差信息,以获得所述待处理图像块的运动信息。其中,本申请实施例中的运动信息可以指运动矢量,该步骤即为将目标标识信息指示的待处理图像块的运动矢量的预测值和解析得到的运动矢量残差值相加,获得待处理图像块的运动矢量。对应的,所述获取目标标识信息,包括:解析所述码流以获得所述目标标识信息。Specifically, the method may be used to decode an image block to be processed. The method further includes: analyzing a code stream to obtain target motion residual information; and correspondingly, predicting the image block to be processed based on the target motion information. The motion information includes: combining the target motion information and the target motion residual information to obtain motion information of the image block to be processed. Wherein, the motion information in the embodiment of the present application may refer to a motion vector. This step is to add the predicted value of the motion vector of the image block to be processed indicated by the target identification information and the residual value of the motion vector obtained by analysis to obtain the to-be-processed Image block motion vector. Correspondingly, the obtaining target identification information includes: parsing the code stream to obtain the target identification information.
具体的,该方法可以用于编码所述目标运动信息;在所述获取目标标识信息之前,还包括:确定编码代价最小的目标运动信息和目标运动残差信息的组合;对应的,所述获取目标标识信息包括:获取所述编码代价最小的目标运动信息在所述至少两个目标运动信息中的标识信息。该方法还包括:编码所述获取的目标标识信息,以及编码所述目标运动残差信息。Specifically, the method may be used to encode the target motion information; before obtaining the target identification information, the method further includes: determining a combination of the target motion information and the target motion residual information with the least encoding cost; and correspondingly, the acquiring The target identification information includes: obtaining identification information of the target motion information with the least coding cost among the at least two target motion information. The method further includes encoding the acquired target identification information and encoding the target motion residual information.
图31为本申请实施例中运动信息预测方法的另一种示例性流程图,具体的,包括以下步骤:FIG. 31 is another exemplary flowchart of a motion information prediction method according to an embodiment of the present application. Specifically, the method includes the following steps:
S3101、确定与待处理图像块具有预设位置关系的至少一个目标像素点的可用性。S3101. Determine the availability of at least one target pixel having a preset positional relationship with the image block to be processed.
其中,所述目标像素点包括位于所述待处理图像块的左侧且与所述待处理图像块不邻接的候选像素点,其中,当所述目标像素点所在的图像块的预测模式为帧内预测时,所述目标像素点不可用。The target pixel point includes a candidate pixel point located on the left side of the image block to be processed and not adjacent to the image block to be processed, and when the prediction mode of the image block where the target pixel point is located is a frame During intra prediction, the target pixel point is unavailable.
应理解,可用性的判断会基于目标像素点所在图像块的预测模式、目标像素点是否在图像区域内、目标像素点指示的位置对应的运动矢量是否一定和其他位置对应的运动矢量相同等因素(比如H.265标准中,对于长方形分块模式,Merge模式的候选预测块的确定方式)。在一种可行的实施方式中,一般的,当所述目标像素点所在的图像块的预测模式为帧间预测时,所述目标像素点可用,但是当目标像素点位于待处理图像块所处的图像的边缘以外或者条带的边缘以外时,目标像素点也为不可用。It should be understood that the judgment of availability will be based on factors such as the prediction mode of the image block where the target pixel is located, whether the target pixel is within the image region, whether the motion vector corresponding to the position indicated by the target pixel is necessarily the same as the motion vector corresponding to other positions ( For example, in the H.265 standard, for the rectangular block mode, the candidate prediction block of the Merge mode is determined). In a feasible implementation manner, generally, when the prediction mode of the image block where the target pixel point is located is inter prediction, the target pixel point is available, but when the target pixel point is located at the image block to be processed When the image is outside the edge of the image or the edge of the strip, the target pixel is also unavailable.
应理解,预设位置关系可以包括与待处理图像块具有邻接位置关系以及具有非邻接位置关系,示例性的,分别如图8和图13所示。图14所示的实施例中对位于所述待处理图像块的左侧且与所述待处理图像块不邻接的第二候选像素点进行了详细论述,在本申请实施例中,目标像素点包括图14所示的实施例中的第二候选像素点,不再赘述。It should be understood that the preset positional relationship may include an adjacent positional relationship with the image block to be processed and a non-adjacent positional relationship. For example, as shown in FIG. 8 and FIG. 13, respectively. In the embodiment shown in FIG. 14, the second candidate pixel point located on the left side of the image block to be processed and not adjacent to the image block to be processed is discussed in detail. In the embodiment of the present application, the target pixel point The second candidate pixel point in the embodiment shown in FIG. 14 is included, and details are not described herein again.
在一种可行的实施方式中,可以确定目标像素点的可用性,即检查目标像素点的坐标处对应的图像块的预测模式。In a feasible implementation manner, the availability of the target pixel point may be determined, that is, the prediction mode of the image block corresponding to the coordinates of the target pixel point is checked.
在另一种可行的实施方式中,可以确定所述目标像素点所在的图像块的可用性,此种实施方式中,检查所述图像块的预测模式,则可以检查所述图像块中某一点的坐标出对应的图像块的预测模式,该点可以是图像块的左上角点,也可以是图像块的中心点,也可以是目标像素点,不做限定。In another feasible implementation manner, the availability of the image block where the target pixel point is located may be determined. In this embodiment, if the prediction mode of the image block is checked, the The coordinates indicate the prediction mode of the corresponding image block. This point can be the upper left corner point of the image block, the center point of the image block, or the target pixel point, which is not limited.
可用性的判断条件包括:当所述目标像素点所在的图像块的预测模式为帧内预测时,所述目标像素点不可用。应理解,当目标像素点的位置位于图像边缘之外,或者条带边缘之外等情况时,目标像素点实际不存在,或者目标像素点的值由推导得出而无法实际测量获得,此时,在一种可行的实施方式中,也认为目标像素点不可用。The condition for determining availability includes: when the prediction mode of the image block where the target pixel is located is intra prediction, the target pixel is unavailable. It should be understood that when the position of the target pixel is outside the edge of the image, or outside the edge of the band, the target pixel does not actually exist, or the value of the target pixel is derived from the derivation and cannot be measured. In a feasible implementation manner, the target pixel point is also considered unavailable.
在一种可行的实施方式中,所述候选像素点的位置,包括:在以所述待处理图像块的左上顶点处的像素点的位置为原点,以所述待处理图像块的上边缘所在的直线为横轴,向 右为水平正方向,以所述待处理图像块的左边缘所在的直线为纵轴,向下为竖直正方向的直角坐标系中的以下坐标点中的至少一个:(-1,h×i-1+h),(-1,h×i+h),(-w×i,h×j-1),(-w×i-1,h×j-1),(-w×i,h×j),(-w×i-1,h×j),其中,w和h为预设正整数,i为正整数,j为非负整数。In a feasible implementation manner, the position of the candidate pixel point includes: taking a position of a pixel point at an upper left vertex of the image block to be processed as an origin point, and a position where an upper edge of the image block to be processed is located The straight line is the horizontal axis, the right is the horizontal positive direction, the straight line where the left edge of the image block to be processed is located is the vertical axis, and the downward is at least one of the following coordinate points in the orthogonal coordinate system in the vertical positive direction. : (-1, h × i-1 + h), (-1, h × i + h), (-w × i, h × j-1), (-w × i-1, h × j- 1), (-w × i, h × j), (-w × i-1, h × j), where w and h are preset positive integers, i is a positive integer, and j is a non-negative integer.
在一种可行的实施方式中,w为所述待处理图像块的宽度,h为所述待处理图像块的高度。In a feasible implementation manner, w is the width of the image block to be processed, and h is the height of the image block to be processed.
在一种可行的实施方式中,运动矢量场通过对所述待处理图像块所在的图像对应的运动信息矩阵进行采样获得,w为所述运动矢量场的采样宽度间隔,h为所述运动矢量场的采样高度间隔。In a feasible implementation manner, the motion vector field is obtained by sampling a motion information matrix corresponding to an image where the image block to be processed is located, w is a sampling width interval of the motion vector field, and h is the motion vector. The sampling height interval of the field.
在一种可行的实施方式中,w×i小于或等于第一阈值。In a feasible implementation manner, w × i is less than or equal to the first threshold.
在一种可行的实施方式中,所述第一阈值等于所述待处理图像块所在的编码树单元CTU的宽度,或者,所述第一阈值等于所述CTU的宽度的2倍。In a feasible implementation manner, the first threshold value is equal to a width of a coding tree unit CTU where the image block to be processed is located, or the first threshold value is equal to twice the width of the CTU.
在一种可行的实施方式中,所述候选像素点为多个,且所述多个候选像素点可用,所述将可用的所述目标像素点对应的运动信息加入所述待处理图像块的候选运动信息集合,包括:按照预设顺序将多个可用的所述候选像素点对应的运动信息加入所述待处理图像块的候选运动信息集合,其中,当所述在先获取的候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为P,当所述在后获取的候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为Q,P小于或等于Q。In a feasible implementation manner, there are multiple candidate pixels and the multiple candidate pixels are available, and the available motion information corresponding to the target pixel is added to the image block to be processed. The candidate motion information set includes: adding motion information corresponding to a plurality of available candidate pixel points to the candidate motion information set of the image block to be processed according to a preset order, wherein when the previously obtained candidate pixel points When corresponding to the target motion information, the length of the binary representation of the target identification information is P. When the candidate pixel points obtained later correspond to the target motion information, the length of the binary representation of the target identification information is Q, P is less than or equal to Q.
在一种可行的实施方式中,所述目标标识信息的二进制表示包括所述目标标识信息的编码码字。In a feasible implementation manner, the binary representation of the target identification information includes an encoded codeword of the target identification information.
在一种可行的实施方式中,所述预设顺序包括:从短到长的距离顺序,其中,所述距离为所述候选像素点在所述直角坐标系中的水平坐标绝对值和竖直坐标绝对值之和;或者,从右到左的顺序;或者,从上到下的顺序;或者,从右上到左下的折线型顺序。In a feasible implementation manner, the preset order includes a short-to-long distance order, wherein the distance is a horizontal coordinate absolute value and a vertical value of the candidate pixel point in the rectangular coordinate system. The sum of the absolute values of the coordinates; or, the order from right to left; or, the order from top to bottom; or the polyline order from top to bottom.
在一种可行的实施方式中,所述距离为连接所述第二候选像素点和所述待处理图像块的左下角顶点位置的像素点的直线线段的长度。In a feasible implementation manner, the distance is a length of a straight line segment connecting the second candidate pixel point and a pixel point at a vertex position of a lower left corner of the image block to be processed.
上述各种可行的实施方式和图14所示的实施例中的各种可行的实施方式技术特征相似,可以参见图14所示的实施例中步骤1401的具体描述,不再赘述。The technical features of the various feasible implementation manners described above are similar to the technical features of the various feasible implementation manners in the embodiment shown in FIG. 14. For details, reference may be made to step 1401 in the embodiment shown in FIG. 14, and details are not described herein again.
S3102、将可用的所述目标像素点对应的运动信息加入所述待处理图像块的候选运动信息集合;S3102. Add the available motion information corresponding to the target pixel to the candidate motion information set of the image block to be processed.
在一种可行的实施方式中,候选运动信息集合包括至少两个相同的运动信息。In a feasible implementation manner, the candidate motion information set includes at least two identical motion information.
在一种可行的实施方式中,所述将可用的所述目标像素点对应的运动信息加入所述待处理图像块的候选运动信息集合,包括:依次获取所述可用的目标像素点;确定当前获取的所述可用的目标像素点的运动信息与所述待处理图像块的候选运动信息集合中的运动信息不同;将所述具有不同运动信息的可用的目标像素点加入所述待处理图像块的候选运动信息集合。In a feasible implementation manner, adding the available motion information corresponding to the target pixel point to the candidate motion information set of the image block to be processed includes: sequentially obtaining the available target pixel point; determining a current The obtained motion information of the available target pixel points is different from the motion information in the candidate motion information set of the image block to be processed; adding the available target pixel points with different motion information to the image block to be processed Candidate motion information set.
在一种可行的实施方式中,所述候选运动信息集合中的运动信息的个数小于或等于预设的第二阈值。In a feasible implementation manner, the number of motion information in the candidate motion information set is less than or equal to a preset second threshold.
上述各种可行的实施方式和图14所示的实施例中的各种可行的实施方式技术特征相似,可以参见图14所示的实施例中步骤1401的具体描述,不再赘述。The technical features of the various feasible implementation manners described above are similar to the technical features of the various feasible implementation manners in the embodiment shown in FIG. 14. For details, reference may be made to step 1401 in the embodiment shown in FIG. 14, and details are not described herein again.
S3103、获取目标标识信息;S3103. Obtain target identification information.
其中,所述目标标识信息用于从所述候选运动信息集合中确定目标运动信息。The target identification information is used to determine target motion information from the candidate motion information set.
目标标识信息可以是一个索引,用来指明候选运动信息列表中的每个运动信息,不同的运动信息通过不同的索引号进行区分。The target identification information may be an index used to indicate each piece of motion information in the candidate motion information list, and different motion information is distinguished by different index numbers.
S3104、根据所述目标运动信息,预测所述待处理图像块的运动信息。S3104. Predict motion information of the image block to be processed according to the target motion information.
在一种可行的实施方式中,所述根据所述目标运动信息,预测所述待处理图像块的运动信息,包括:将所述目标运动信息作为所述待处理图像块的运动信息。In a feasible implementation manner, the predicting the motion information of the image block to be processed according to the target motion information includes: using the target motion information as the motion information of the image block to be processed.
在一种可行的实施方式中,所述方法用于解码所述待处理图像块,还包括:解析码流以获得目标运动残差信息;对应的,所述根据所述目标运动信息,预测所述待处理图像块的运动信息,包括:组合所述目标运动信息和所述目标运动残差信息,以获得所述待处理图像块的运动信息。对应的,所述获取目标标识信息,包括:解析所述码流以获得所述目标标识信息。In a feasible implementation manner, the method is used to decode the image block to be processed, further comprising: analyzing a code stream to obtain target motion residual information; and correspondingly, predicting the target motion information based on the target motion information. The motion information of the image block to be processed includes: combining the target motion information and the target motion residual information to obtain the motion information of the image block to be processed. Correspondingly, the obtaining target identification information includes: parsing the code stream to obtain the target identification information.
在一种可行的实施方式中,所述方法用于编码所述待处理图像块,在所述获取目标标识信息之前,还包括:确定编码代价最小的目标运动信息和目标运动残差信息的组合;对应的,所述获取目标标识信息包括:获取所述编码代价最小的目标运动信息在所述至少两个目标运动信息中的标识信息。In a feasible implementation manner, the method is used to encode the image block to be processed, and before the obtaining the target identification information, the method further includes: determining a combination of target motion information and target motion residual information with the least coding cost. Correspondingly, the obtaining target identification information includes: obtaining identification information of the target motion information with the least coding cost among the at least two target motion information.
在一种可行的实施方式中,所述方法还包括编码所述获取的目标标识信息。所述方式还包括编码所述目标运动残差信息。In a feasible implementation manner, the method further includes encoding the acquired target identification information. The manner further includes encoding the target motion residual information.
上述各种可行的实施方式和图14所示的实施例中的各种可行的实施方式技术特征相似,可以参见图14所示的实施例中步骤1403的具体描述,不再赘述。The technical features of the various feasible implementation manners described above are similar to the technical features of the various feasible implementation manners in the embodiment shown in FIG. 14. For details, refer to the detailed description of step 1403 in the embodiment shown in FIG. 14.
图32为本申请实施例中运动信息预测装置3200的一种示例性结构框图,具体的,包括以下模块:FIG. 32 is an exemplary structural block diagram of a motion information prediction device 3200 according to an embodiment of the present application, and specifically includes the following modules:
获取模块3201,用于获取与待处理图像块具有预设位置关系的至少两个目标像素点,所述目标像素点包括与所述待处理图像块邻接的第一候选像素点和位于所述待处理图像块的左侧且与所述待处理图像块不邻接的第二候选像素点;An obtaining module 3201 is configured to obtain at least two target pixel points having a preset positional relationship with an image block to be processed, where the target pixel points include a first candidate pixel point adjacent to the image block to be processed and the target pixel point located at the to-be-processed image block. Processing a second candidate pixel point on the left side of the image block and not adjacent to the image block to be processed;
索引模块3202,用于获取目标标识信息,所述目标标识信息用于从所述至少两个目标像素点对应的运动信息中确定目标运动信息,其中,当所述第一候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为N,当所述第二候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为M,N小于或等于M;The indexing module 3202 is configured to obtain target identification information, where the target identification information is used to determine target motion information from motion information corresponding to the at least two target pixel points, and when the first candidate pixel point corresponds to the For target motion information, the length of the binary representation of the target identification information is N. When the second candidate pixel corresponds to the target motion information, the length of the binary representation of the target identification information is M, and N is less than or Equal to M
计算模块3203,用于根据所述目标运动信息,预测所述待处理图像块的运动信息。A calculation module 3203 is configured to predict motion information of the image block to be processed according to the target motion information.
在一种可行的实施方式中,所述目标标识信息的二进制表示包括所述目标标识信息的编码码字。In a feasible implementation manner, the binary representation of the target identification information includes an encoded codeword of the target identification information.
在一种可行的实施方式中,所述第二候选像素点的位置,包括:在以所述待处理图像块的左上顶点处的像素点的位置为原点,以所述待处理图像块的上边缘所在的直线为横轴,向右为水平正方向,以所述待处理图像块的左边缘所在的直线为纵轴,向下为竖直正方向的直角坐标系中的以下坐标点中的至少一个:(-1,h×i-1+h),(-1,h×i+h),(-w×i,h×j-1),(-w×i-1,h×j-1),(-w×i,h×j),(-w×i-1,h×j),其中,w和h为预设正整数,i为正整数,j为非负整数。In a feasible implementation manner, the position of the second candidate pixel point includes: using a position of a pixel point at an upper left vertex of the image block to be processed as an origin, and using an upper position of the image block to be processed The line where the edge is located is the horizontal axis, the right is the horizontal positive direction, the line where the left edge of the image block to be processed is located is the vertical axis, and the downward is the following coordinate points in the orthogonal coordinate system in the vertical positive direction At least one: (-1, h × i-1 + h), (-1, h × i + h), (-w × i, h × j-1), (-w × i-1, h × j-1), (-w × i, h × j), (-w × i-1, h × j), where w and h are preset positive integers, i is a positive integer, and j is a non-negative integer .
在一种可行的实施方式中,w为所述待处理图像块的宽度,h为所述待处理图像块的高度。In a feasible implementation manner, w is the width of the image block to be processed, and h is the height of the image block to be processed.
在一种可行的实施方式中,运动矢量场通过对所述待处理图像块所在的图像对应的运动信息矩阵进行采样获得,w为所述运动矢量场的采样宽度间隔,h为所述运动矢量场的采样高度间隔。In a feasible implementation manner, the motion vector field is obtained by sampling a motion information matrix corresponding to an image where the image block to be processed is located, w is a sampling width interval of the motion vector field, and h is the motion vector. The sampling height interval of the field.
在一种可行的实施方式中,w×i小于或等于第一阈值。In a feasible implementation manner, w × i is less than or equal to the first threshold.
在一种可行的实施方式中,所述第一阈值等于所述待处理图像块所在的编码树单元CTU的宽度,或者,所述第一阈值等于所述CTU的宽度的2倍。In a feasible implementation manner, the first threshold value is equal to a width of a coding tree unit CTU where the image block to be processed is located, or the first threshold value is equal to twice the width of the CTU.
在一种可行的实施方式中,所述第二候选像素点为多个,所述获取模块2001具体用于:按照所述预设顺序获取所述至少两个目标像素点中的多个第二候选像素点,其中,当所述在先获取的第二候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为P,当所述在后获取的第二候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为Q,P小于或等于Q;In a feasible implementation manner, there are multiple second candidate pixels, and the obtaining module 2001 is specifically configured to obtain multiple second ones of the at least two target pixels in the preset order. Candidate pixel points, wherein when the previously obtained second candidate pixel point corresponds to the target motion information, the length of the binary representation of the target identification information is P, and when the later obtained second candidate pixel When the point corresponds to the target motion information, the length of the binary representation of the target identification information is Q, and P is less than or equal to Q;
在一种可行的实施方式中,所述预设顺序包括:从短到长的距离顺序,其中,所述距离为所述第二候选像素点在所述直角坐标系中的水平坐标绝对值和竖直坐标绝对值之和;或者,从右到左的顺序;或者,从上到下的顺序;或者,从右上到左下的折线型顺序。In a feasible implementation manner, the preset order includes a short-to-long distance order, where the distance is an absolute value of a horizontal coordinate of the second candidate pixel point in the rectangular coordinate system and The sum of the absolute values of the vertical coordinates; or, the order from right to left; or, the order from top to bottom; or the polyline order from top to bottom.
在一种可行的实施方式中,所述距离为连接所述第二候选像素点和所述待处理图像块的左下角顶点位置的像素点的直线线段的长度。In a feasible implementation manner, the distance is a length of a straight line segment connecting the second candidate pixel point and a pixel point at a vertex position of a lower left corner of the image block to be processed.
在一种可行的实施方式中,所述获取的至少两个目标像素点中,至少两个目标像素点的运动信息相同。In a feasible implementation manner, among the obtained at least two target pixel points, motion information of at least two target pixel points is the same.
在一种可行的实施方式中,所述获取模块3201具体用于:依次获取与所述待处理图像块具有所述预设位置关系的候选像素点;确定当前获取的所述候选像素点的运动信息与已获取的所述目标像素点的运动信息不同;将所述具有不同运动信息的候选像素点作为所述目标像素点。In a feasible implementation manner, the obtaining module 3201 is specifically configured to sequentially obtain candidate pixel points having the preset positional relationship with the image block to be processed, and determine a motion of the currently obtained candidate pixel points. The information is different from the obtained motion information of the target pixel point; the candidate pixel point having different motion information is used as the target pixel point.
在一种可行的实施方式中,所述获取的目标像素点的个数为预设的第二阈值。In a feasible implementation manner, the number of the obtained target pixel points is a preset second threshold.
在一种可行的实施方式中,所述计算模块3203具体用于:将所述目标运动信息作为所述待处理图像块的运动信息。In a feasible implementation manner, the calculation module 3203 is specifically configured to use the target motion information as the motion information of the image block to be processed.
在一种可行的实施方式中,所述装置3200用于解码所述待处理图像块,所述索引模块3202还用于:解析码流以获得目标运动残差信息;对应的,所述计算模块3203具体用于:组合所述目标运动信息和所述目标运动残差信息,以获得所述待处理图像块的运动信息。In a feasible implementation manner, the device 3200 is configured to decode the image block to be processed, and the indexing module 3202 is further configured to parse a code stream to obtain target motion residual information; correspondingly, the calculation module 3203 is specifically configured to: combine the target motion information and the target motion residual information to obtain motion information of the image block to be processed.
在一种可行的实施方式中,所述索引模块3202具体用于:解析所述码流以获得所述目标标识信息。In a feasible implementation manner, the indexing module 3202 is specifically configured to: parse the code stream to obtain the target identification information.
在一种可行的实施方式中,所述装置3200用于编码所述待处理图像块,所述获取模块3201还用于:确定编码代价最小的目标运动信息和目标运动残差信息的组合;对应的,所述索引模块3202具体用于:获取所述编码代价最小的目标运动信息在所述至少两个目标运动信息中的标识信息。In a feasible implementation manner, the device 3200 is configured to encode the image block to be processed, and the obtaining module 3201 is further configured to: determine a combination of target motion information and target motion residual information with a minimum coding cost; corresponding The indexing module 3202 is specifically configured to obtain identification information of the target motion information with the least coding cost among the at least two target motion information.
在一种可行的实施方式中,所述索引模块3202还用于:编码所述获取的目标标识信息。In a feasible implementation manner, the indexing module 3202 is further configured to encode the obtained target identification information.
在一种可行的实施方式中,所述索引模块3202还用于:编码所述目标运动残差信息。In a feasible implementation manner, the indexing module 3202 is further configured to: encode the target motion residual information.
图33为本申请实施例中运动信息预测装置3300的另一种示例性结构框图,具体的,包括以下模块:FIG. 33 is another exemplary structural block diagram of the motion information prediction device 3300 in the embodiment of the present application, and specifically includes the following modules:
检测模块3301,用于确定与待处理图像块具有预设位置关系的至少一个目标像素点的可用性,所述目标像素点包括位于所述待处理图像块的左侧且与所述待处理图像块不邻接的候选像素点,其中,当所述目标像素点所在的图像块的预测模式为帧内预测时,所述目标像素点不可用;A detection module 3301 is configured to determine availability of at least one target pixel point having a preset positional relationship with an image block to be processed, where the target pixel point is located on a left side of the image block to be processed and is related to the image block to be processed Non-adjacent candidate pixels, wherein when the prediction mode of the image block where the target pixel is located is intra prediction, the target pixel is unavailable;
获取模块3302,用于将可用的所述目标像素点对应的运动信息加入所述待处理图像块的候选运动信息集合;An obtaining module 3302, configured to add available motion information corresponding to the target pixel point to a candidate motion information set of the image block to be processed;
索引模块3303,用于获取目标标识信息,所述目标标识信息用于从所述候选运动信息集合中确定目标运动信息;An indexing module 3303, configured to obtain target identification information, where the target identification information is used to determine target motion information from the candidate motion information set;
计算模块3304,用于根据所述目标运动信息,预测所述待处理图像块的运动信息。A calculation module 3304 is configured to predict motion information of the image block to be processed according to the target motion information.
在一种可行的实施方式中,所述检测模块3301具体用于:确定所述目标像素点所在的图像块的可用性。In a feasible implementation manner, the detection module 3301 is specifically configured to determine availability of an image block where the target pixel point is located.
在一种可行的实施方式中,所述候选像素点的位置,包括:在以所述待处理图像块的左上顶点处的像素点的位置为原点,以所述待处理图像块的上边缘所在的直线为横轴,向右为水平正方向,以所述待处理图像块的左边缘所在的直线为纵轴,向下为竖直正方向的直角坐标系中的以下坐标点中的至少一个:(-1,h×i-1+h),(-1,h×i+h),(-w×i,h×j-1),(-w×i-1,h×j-1),(-w×i,h×j),(-w×i-1,h×j),其中,w和h为预设正整数,i为正整数,j为非负整数。In a feasible implementation manner, the position of the candidate pixel point includes: taking a position of a pixel point at an upper left vertex of the image block to be processed as an origin point, and a position where an upper edge of the image block to be processed is located The straight line is the horizontal axis, the right is the horizontal positive direction, the straight line where the left edge of the image block to be processed is located is the vertical axis, and the downward is at least one of the following coordinate points in the orthogonal coordinate system in the vertical positive direction : (-1, h × i-1 + h), (-1, h × i + h), (-w × i, h × j-1), (-w × i-1, h × j- 1), (-w × i, h × j), (-w × i-1, h × j), where w and h are preset positive integers, i is a positive integer, and j is a non-negative integer.
在一种可行的实施方式中,w为所述待处理图像块的宽度,h为所述待处理图像块的高度。In a feasible implementation manner, w is the width of the image block to be processed, and h is the height of the image block to be processed.
在一种可行的实施方式中,运动矢量场通过对所述待处理图像块所在的图像对应的运动信息矩阵进行采样获得,w为所述运动矢量场的采样宽度间隔,h为所述运动矢量场的采样高度间隔。In a feasible implementation manner, the motion vector field is obtained by sampling a motion information matrix corresponding to an image where the image block to be processed is located, w is a sampling width interval of the motion vector field, and h is the motion vector. The sampling height interval of the field.
在一种可行的实施方式中,w×i小于或等于第一阈值。In a feasible implementation manner, w × i is less than or equal to the first threshold.
在一种可行的实施方式中,所述第一阈值等于所述待处理图像块所在的编码树单元CTU的宽度,或者,所述第一阈值等于所述CTU的宽度的2倍。In a feasible implementation manner, the first threshold value is equal to a width of a coding tree unit CTU where the image block to be processed is located, or the first threshold value is equal to twice the width of the CTU.
在一种可行的实施方式中,所述候选像素点为多个,且所述多个候选像素点可用,所述获取模块3302具体用于:按照预设顺序将多个可用的所述候选像素点对应的运动信息加入所述待处理图像块的候选运动信息集合,其中,当所述在先获取的候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为P,当所述在后获取的候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为Q,P小于或等于Q。In a feasible implementation manner, there are multiple candidate pixel points, and the multiple candidate pixel points are available. The obtaining module 3302 is specifically configured to: according to a preset order, use the multiple candidate pixel points that are available. The motion information corresponding to the points is added to the candidate motion information set of the image block to be processed, wherein when the previously obtained candidate pixel points correspond to the target motion information, the length of the binary representation of the target identification information is P When the candidate pixel points obtained later correspond to the target motion information, the length of the binary representation of the target identification information is Q, and P is less than or equal to Q.
在一种可行的实施方式中,所述目标标识信息的二进制表示包括所述目标标识信息的编码码字。In a feasible implementation manner, the binary representation of the target identification information includes an encoded codeword of the target identification information.
在一种可行的实施方式中,所述预设顺序包括:从短到长的距离顺序,其中,所述距离为第二候选像素点在所述直角坐标系中的水平坐标绝对值和竖直坐标绝对值之和;或者,从右到左的顺序;或者,从上到下的顺序;或者,从右上到左下的折线型顺序。In a feasible implementation manner, the preset order includes a short-to-long distance order, where the distance is a horizontal coordinate absolute value and a vertical value of a second candidate pixel point in the rectangular coordinate system. The sum of the absolute values of the coordinates; or, the order from right to left; or, the order from top to bottom; or the polyline order from top to bottom.
在一种可行的实施方式中,所述距离为连接所述第二候选像素点和所述待处理图像块的左下角顶点位置的像素点的直线线段的长度。In a feasible implementation manner, the distance is a length of a straight line segment connecting the second candidate pixel point and a pixel point at a vertex position of a lower left corner of the image block to be processed.
在一种可行的实施方式中,候选运动信息集合包括至少两个相同的运动信息。In a feasible implementation manner, the candidate motion information set includes at least two identical motion information.
在一种可行的实施方式中,所述获取模块3302具体用于:依次获取所述可用的目标像素点;确定当前获取的所述可用的目标像素点的运动信息与所述待处理图像块的候选运动信息集合中的运动信息不同;将所述具有不同运动信息的可用的目标像素点加入所述待处理图像块的候选运动信息集合。In a feasible implementation manner, the obtaining module 3302 is specifically configured to sequentially obtain the available target pixel points; determine the currently obtained motion information of the available target pixel points and the The motion information in the candidate motion information set is different; the available target pixels with different motion information are added to the candidate motion information set of the image block to be processed.
在一种可行的实施方式中,所述候选运动信息集合中的运动信息的个数小于或等于预设的第二阈值。In a feasible implementation manner, the number of motion information in the candidate motion information set is less than or equal to a preset second threshold.
在一种可行的实施方式中,所述计算模块3304具体用于:将所述目标运动信息作为所述待处理图像块的运动信息。In a feasible implementation manner, the calculation module 3304 is specifically configured to use the target motion information as the motion information of the image block to be processed.
在一种可行的实施方式中,所述装置3300用于解码所述待处理图像块,所述索引模块3303还用于:解析码流以获得目标运动残差信息;对应的,所述计算模块3104具体用于:组合所述目标运动信息和所述目标运动残差信息,以获得所述待处理图像块的运动信息。In a feasible implementation manner, the device 3300 is configured to decode the image block to be processed, and the indexing module 3303 is further configured to parse a code stream to obtain target motion residual information; correspondingly, the calculation module 3104 is specifically configured to: combine the target motion information and the target motion residual information to obtain motion information of the image block to be processed.
在一种可行的实施方式中,所述索引模块3303具体用于:解析所述码流以获得所述目标标识信息。In a feasible implementation manner, the indexing module 3303 is specifically configured to: parse the code stream to obtain the target identification information.
在一种可行的实施方式中,所述装置3300用于编码所述待处理图像块,所述获取模块3302还用于:确定编码代价最小的目标运动信息和目标运动残差信息的组合;对应的,所述索引模块3303具体用于:获取所述编码代价最小的目标运动信息在所述至少两个目标运动信息中的标识信息。In a feasible implementation manner, the device 3300 is configured to encode the image block to be processed, and the obtaining module 3302 is further configured to: determine a combination of target motion information and target motion residual information with the least coding cost; corresponding The indexing module 3303 is specifically configured to obtain identification information of the target motion information with the least coding cost among the at least two target motion information.
在一种可行的实施方式中,所述索引模块3303还用于:编码所述获取的目标标识信息。In a feasible implementation manner, the indexing module 3303 is further configured to: encode the obtained target identification information.
在一种可行的实施方式中,所述索引模块3303还用于:编码所述目标运动残差信息。In a feasible implementation manner, the indexing module 3303 is further configured to: encode the target motion residual information.
图34为本申请实施例中的运动信息预测设备3400的一种示意性结构框图。具体的,包括:处理器3401和耦合于所述处理器的存储器3402;所述处理器3401用于执行图14或图32所示的实施例以及各种可行的实施方式。FIG. 34 is a schematic structural block diagram of a motion information prediction device 3400 in an embodiment of the present application. Specifically, it includes: a processor 3401 and a memory 3402 coupled to the processor; the processor 3401 is configured to execute the embodiment shown in FIG. 14 or FIG. 32 and various feasible implementation manners.
虽然关于视频编码器100及视频解码器200已描述本申请的特定方面,但应理解,本申请的技术可通过许多其它视频编码和/或编码单元、处理器、处理单元、例如编码器/解码器(CODEC)的基于硬件的编码单元及类似者来应用。此外,应理解,仅作为可行的实施方式而提供关于图14和图32所展示及描述的步骤。即,图14、图32的可行的实施方式中所展示的步骤无需必定按图14、图32中所展示的次序执行,且可执行更少、额外或替代步骤。Although specific aspects of the application have been described with respect to video encoder 100 and video decoder 200, it should be understood that the techniques of this application may be implemented by many other video encoding and / or encoding units, processors, processing units, such as encoders / decoders (CODEC) hardware-based coding unit and the like. In addition, it should be understood that the steps shown and described with respect to FIGS. 14 and 32 are provided only as possible implementations. That is, the steps shown in the feasible embodiments of FIGS. 14 and 32 need not necessarily be performed in the order shown in FIGS. 14 and 32 and fewer, additional, or alternative steps may be performed.
此外,应理解,取决于可行的实施方式,本文中所描述的方法中的任一者的特定动作或事件可按不同序列执行,可经添加、合并或一起省去(例如,并非所有所描述的动作或事件为实践方法所必要的)。此外,在特定可行的实施方式中,动作或事件可(例如)经由多线程处理、中断处理或多个处理器来同时而非顺序地执行。另外,虽然出于清楚的目的将本申请的特定方面描述为通过单一模块或单元执行,但应理解,本申请的技术可通过与视频解码器相关联的单元或模块的组合执行。In addition, it should be understood that depending on the feasible implementation, a particular action or event of any of the methods described herein may be performed in a different sequence, may be added, merged, or omitted together (e.g., not all described Actions or events are necessary for practical methods). Furthermore, in certain possible implementations, actions or events may be performed simultaneously, for example, via multi-threaded processing, interrupt processing, or multiple processors, rather than sequentially. In addition, although certain aspects of the present application are described as being performed by a single module or unit for clarity, it should be understood that the techniques of this application may be performed by a unit or combination of modules associated with a video decoder.
在一个或多个可行的实施方式中,所描述的功能可以硬件、软件、固件或其任何组合来实施。如果以软件来实施,那么功能可作为一个或多个指令或代码而存储于计算机可读媒体上或经由计算机可读媒体来传输,且通过基于硬件的处理单元来执行。计算机可读媒体可包含计算机可读存储媒体或通信媒体,计算机可读存储媒体对应于例如数据存储媒体 的有形媒体,通信媒体包含促进计算机程序(例如)根据通信协议从一处传送到另一处的任何媒体。In one or more possible implementations, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over a computer-readable medium as one or more instructions or code, and executed by a hardware-based processing unit. The computer-readable medium may include a computer-readable storage medium or a communication medium, the computer-readable storage medium corresponding to a tangible medium such as a data storage medium, and the communication medium includes a computer program that facilitates, for example, transmission from one place to another according to a communication protocol Any media.
以这个方式,计算机可读媒体示例性地可对应于(1)非暂时性的有形计算机可读存储媒体,或(2)例如信号或载波的通信媒体。数据存储媒体可为可由一个或多个计算机或一个或多个处理器存取以检索用于实施本申请中所描述的技术的指令、代码和/或数据结构的任何可用媒体。计算机程序产品可包含计算机可读媒体。In this manner, computer-readable media may illustratively correspond to (1) non-transitory, tangible computer-readable storage media, or (2) a communication medium such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code, and / or data structures used to implement the techniques described in this application. The computer program product may include a computer-readable medium.
作为可行的实施方式而非限制,此计算机可读存储媒体可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储装置、磁盘存储装置或其它磁性存储装置、快闪存储器或可用于存储呈指令或数据结构的形式的所要代码且可由计算机存取的任何其它媒体。同样,任何连接可适当地称作计算机可读媒体。例如,如果使用同轴缆线、光纤缆线、双绞线、数字订户线(DSL),或例如红外线、无线电及微波的无线技术而从网站、服务器或其它远端源传输指令,那么同轴缆线、光纤缆线、双绞线、DSL,或例如红外线、无线电及微波的无线技术包含于媒体的定义中。As a feasible implementation, without limitation, the computer-readable storage medium may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage devices, magnetic disk storage devices or other magnetic storage devices, flash memory, or may be used to store rendering instructions. Or any other medium in the form of a data structure with the desired code and accessible by a computer. Also, any connection is properly termed a computer-readable medium. For example, if a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technology such as infrared, radio, and microwave is used to transmit instructions from a website, server, or other remote source, then coaxial Cables, fiber optic cables, twisted pairs, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of media.
然而,应理解,计算机可读存储媒体及数据存储媒体不包含连接、载波、信号或其它暂时性媒体,而替代地针对非暂时性有形存储媒体。如本文中所使用,磁盘及光盘包含紧密光盘(CD)、雷射光盘、光盘、数字多功能光盘(DVD)、软性磁盘及蓝光光盘,其中磁盘通常以磁性方式再现数据,而光盘通过雷射以光学方式再现数据。以上各物的组合也应包含于计算机可读媒体的范围内。It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transient media, but are instead directed to non-transitory, tangible storage media. As used herein, magnetic disks and optical discs include compact discs (CDs), laser discs, optical discs, digital versatile discs (DVDs), flexible disks and Blu-ray discs, where disks typically reproduce data magnetically, and optical discs pass through The data is reproduced optically. Combinations of the above should also be included within the scope of computer-readable media.
可通过例如一个或多个数字信号处理器(DSP)、通用微处理器、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或其它等效集成或离散逻辑电路的一个或多个处理器来执行指令。因此,如本文中所使用,术语“处理器”可指前述结构或适于实施本文中所描述的技术的任何其它结构中的任一者。另外,在一些方面中,可将本文所描述的功能性提供于经配置以用于编码及解码的专用硬件和/或软件模块内,或并入于组合式编码解码器中。同样,技术可完全实施于一个或多个电路或逻辑元件中。Can be processed by one or more of, for example, one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), or other equivalent integrated or discrete logic circuits To execute instructions. Thus, as used herein, the term "processor" may refer to any of the aforementioned structures or any other structure suitable for implementing the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and / or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.
本申请的技术可实施于广泛多种装置或设备中,包含无线手机、集成电路(IC)或IC的集合(例如,芯片组)。本申请中描述各种组件、模块或单元以强调经配置以执行所揭示的技术的装置的功能方面,但未必需要通过不同硬件单元实现。更确切来说,如前文所描述,各种单元可组合于编码解码器硬件单元中或由互操作的硬件单元(包含如前文所描述的一个或多个处理器)结合合适软件和/或固件的集合来提供。The techniques of this application can be implemented in a wide variety of devices or devices, including wireless handsets, integrated circuits (ICs), or collections of ICs (eg, chipset). Various components, modules, or units are described in this application to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily need to be implemented by different hardware units. More specifically, as described above, various units may be combined in a codec hardware unit or by interoperable hardware units (including one or more processors as described above) combined with appropriate software and / or firmware To provide.
以上所述,仅为本申请示例性的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应该以权利要求的保护范围为准。The above description is only an exemplary specific implementation of the present application, but the scope of protection of the present application is not limited to this. Any person skilled in the art can easily think of changes or Replacement shall be covered by the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.
Claims (77)
- 一种图像块的运动信息的预测方法,其特征在于,包括:A method for predicting motion information of an image block, comprising:获取与待处理图像块具有预设位置关系的至少两个目标像素点,所述目标像素点包括与所述待处理图像块邻接的第一候选像素点和位于所述待处理图像块的左侧且与所述待处理图像块不邻接的第二候选像素点;Obtaining at least two target pixel points having a preset positional relationship with the image block to be processed, the target pixel point including a first candidate pixel point adjacent to the image block to be processed and located on a left side of the image block to be processed A second candidate pixel point that is not adjacent to the image block to be processed;获取目标标识信息,所述目标标识信息用于从所述至少两个目标像素点对应的运动信息中确定目标运动信息,其中,当所述第一候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为N,当所述第二候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为M,N小于或等于M;Acquiring target identification information, where the target identification information is used to determine target motion information from motion information corresponding to the at least two target pixel points, and when the first candidate pixel point corresponds to the target motion information, all The length of the binary representation of the target identification information is N. When the second candidate pixel corresponds to the target motion information, the length of the binary representation of the target identification information is M, and N is less than or equal to M;根据所述目标运动信息,预测所述待处理图像块的运动信息。Predicting motion information of the image block to be processed according to the target motion information.
- 根据权利要求1所述的方法,其特征在于,所述目标标识信息的二进制表示包括所述目标标识信息的编码码字。The method of claim 1, wherein the binary representation of the target identification information includes a coded codeword of the target identification information.
- 根据权利要求1或2所述的方法,其特征在于,所述第二候选像素点的位置,包括:The method according to claim 1 or 2, wherein the position of the second candidate pixel point comprises:在以所述待处理图像块的左上顶点处的像素点的位置为原点,以所述待处理图像块的上边缘所在的直线为横轴,向右为水平正方向,以所述待处理图像块的左边缘所在的直线为纵轴,向下为竖直正方向的直角坐标系中的以下坐标点中的至少一个:(-1,h×i-1+h),(-1,h×i+h),(-w×i,h×j-1),(-w×i-1,h×j-1),(-w×i,h×j),(-w×i-1,h×j),其中,w和h为预设正整数,i为正整数,j为非负整数。The position of the pixel at the upper left vertex of the image block to be processed is used as the origin, the straight line where the upper edge of the image block to be processed is located is the horizontal axis, and the right is the horizontal positive direction. The line where the left edge of the block is located is the vertical axis and downward is at least one of the following coordinate points in a rectangular coordinate system in the vertical positive direction: (-1, h × i-1 + h), (-1, h × i + h), (-w × i, h × j-1), (-w × i-1, h × j-1), (-w × i, h × j), (-w × i -1, h × j), where w and h are preset positive integers, i is a positive integer, and j is a non-negative integer.
- 根据权利要求3所述的方法,其特征在于,w为所述待处理图像块的宽度,h为所述待处理图像块的高度。The method according to claim 3, wherein w is a width of the image block to be processed, and h is a height of the image block to be processed.
- 根据权利要求3所述的方法,其特征在于,运动矢量场通过对所述待处理图像块所在的图像对应的运动信息矩阵进行采样获得,w为所述运动矢量场的采样宽度间隔,h为所述运动矢量场的采样高度间隔。The method according to claim 3, wherein the motion vector field is obtained by sampling a motion information matrix corresponding to an image in which the image block to be processed is located, w is a sampling width interval of the motion vector field, and h is The sampling height interval of the motion vector field.
- 根据权利要求3至5任一项所述的方法,其特征在于,w×i小于或等于第一阈值。The method according to any one of claims 3 to 5, wherein w × i is less than or equal to a first threshold.
- 根据权利要求6所述的方法,其特征在于,所述第一阈值等于所述待处理图像块所在的编码树单元CTU的宽度,或者,所述第一阈值等于所述CTU的宽度的2倍。The method according to claim 6, wherein the first threshold is equal to a width of a coding tree unit CTU in which the image block to be processed is located, or the first threshold is equal to twice the width of the CTU .
- 根据权利要求1至7任一项所述的方法,其特征在于,所述第二候选像素点为多个,所述获取与待处理图像块具有预设位置关系的至少两个目标像素点,包括:按照所述预设顺序获取所述至少两个目标像素点中的多个第二候选像素点,其中,当所述在先获取的第二候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为P, 当所述在后获取的第二候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为Q,P小于或等于Q。The method according to any one of claims 1 to 7, wherein there are a plurality of second candidate pixel points, and the acquiring at least two target pixel points having a preset position relationship with an image block to be processed, The method includes: obtaining a plurality of second candidate pixel points of the at least two target pixel points according to the preset order, wherein when the second candidate pixel point previously obtained corresponds to the target motion information, all the The length of the binary representation of the target identification information is P, and when the second candidate pixel point obtained later corresponds to the target motion information, the length of the binary representation of the target identification information is Q, and P is less than or equal to Q .
- 根据权利要求8所述的方法,其特征在于,所述预设顺序包括:从短到长的距离顺序,其中,所述距离为所述第二候选像素点在所述直角坐标系中的水平坐标绝对值和竖直坐标绝对值之和;或者,从右到左的顺序;或者,从上到下的顺序;或者,从右上到左下的折线型顺序。The method according to claim 8, wherein the preset order comprises: a short-to-long distance order, wherein the distance is a level of the second candidate pixel point in the rectangular coordinate system The sum of the absolute value of the coordinate and the absolute value of the vertical coordinate; or, the order from right to left; or, the order from top to bottom; or the order of the polyline type from top to bottom.
- 根据权利要求1至9任一项所述的方法,其特征在于,所述获取的至少两个目标像素点中,至少两个目标像素点的运动信息相同。The method according to any one of claims 1 to 9, wherein, among the at least two target pixel points obtained, motion information of at least two target pixel points is the same.
- 根据权利要求1至10任一项所述的方法,其特征在于,所述获取与待处理图像块具有预设位置关系的至少两个目标像素点,包括:The method according to any one of claims 1 to 10, wherein the acquiring at least two target pixel points having a preset position relationship with an image block to be processed comprises:依次获取与所述待处理图像块具有所述预设位置关系的候选像素点;Sequentially obtaining candidate pixel points having the preset positional relationship with the image block to be processed;确定当前获取的所述候选像素点的运动信息与已获取的所述目标像素点的运动信息不同;Determining that the currently acquired motion information of the candidate pixel point is different from the acquired motion information of the target pixel point;将所述具有不同运动信息的候选像素点作为所述目标像素点。The candidate pixel points having different motion information are used as the target pixel points.
- 根据权利要求1至11任一项所述的方法,其特征在于,所述获取的目标像素点的个数为预设的第二阈值。The method according to any one of claims 1 to 11, wherein the number of the obtained target pixel points is a preset second threshold.
- 根据权利要求1至12任一项所述的方法,其特征在于,所述根据所述目标运动信息,预测所述待处理图像块的运动信息,包括:The method according to any one of claims 1 to 12, wherein the predicting motion information of the image block to be processed according to the target motion information comprises:将所述目标运动信息作为所述待处理图像块的运动信息。Using the target motion information as the motion information of the image block to be processed.
- 根据权利要求1至13任一项所述的方法,其特征在于,所述方法用于解码所述待处理图像块,还包括:The method according to any one of claims 1 to 13, wherein the method is used to decode the image block to be processed, further comprising:解析码流以获得目标运动残差信息;Parse the code stream to obtain the target motion residual information;对应的,所述根据所述目标运动信息,预测所述待处理图像块的运动信息,包括:Correspondingly, predicting the motion information of the image block to be processed according to the target motion information includes:组合所述目标运动信息和所述目标运动残差信息,以获得所述待处理图像块的运动信息。Combining the target motion information and the target motion residual information to obtain motion information of the image block to be processed.
- 根据权利要求14所述的方法,其特征在于,所述获取目标标识信息,包括:The method according to claim 14, wherein the acquiring target identification information comprises:解析所述码流以获得所述目标标识信息。Parse the code stream to obtain the target identification information.
- 根据权利要求1至15任一项所述的方法,其特征在于,所述方法用于编码所述待处理图像块,在所述获取目标标识信息之前,还包括:The method according to any one of claims 1 to 15, wherein the method is configured to encode the image block to be processed, and before the acquiring target identification information, further comprising:确定编码代价最小的目标运动信息和目标运动残差信息的组合;Determine the combination of target motion information and target motion residual information with the least coding cost;对应的,所述获取目标标识信息包括:Correspondingly, the obtaining target identification information includes:获取所述编码代价最小的目标运动信息在所述至少两个目标运动信息中的标识信息。The identification information of the target motion information with the least coding cost in the at least two target motion information is acquired.
- 根据权利要求16所述的方法,其特征在于,还包括:The method according to claim 16, further comprising:编码所述获取的目标标识信息。Encoding the acquired target identification information.
- 根据权利要求16或17所述的方法,其特征在于,还包括:The method according to claim 16 or 17, further comprising:编码所述目标运动残差信息。Encoding the target motion residual information.
- 一种图像块的运动信息的预测方法,其特征在于,包括:A method for predicting motion information of an image block, comprising:确定与待处理图像块具有预设位置关系的至少一个目标像素点的可用性,所述目标像素点包括位于所述待处理图像块的左侧且与所述待处理图像块不邻接的候选像素点,其中,当所述目标像素点所在的图像块的预测模式为帧内预测时,所述目标像素点不可用;Determining the availability of at least one target pixel point having a preset positional relationship with the image block to be processed, the target pixel point including candidate pixel points located on the left side of the image block to be processed and not adjacent to the image block , Wherein when the prediction mode of the image block where the target pixel is located is intra prediction, the target pixel is unavailable;将可用的所述目标像素点对应的运动信息加入所述待处理图像块的候选运动信息集合;Adding the available motion information corresponding to the target pixel to the candidate motion information set of the image block to be processed;获取目标标识信息,所述目标标识信息用于从所述候选运动信息集合中确定目标运动信息;Acquiring target identification information, where the target identification information is used to determine target motion information from the candidate motion information set;根据所述目标运动信息,预测所述待处理图像块的运动信息。Predicting motion information of the image block to be processed according to the target motion information.
- 根据权利要求19所述的方法,其特征在于,所述确定与待处理图像块具有预设位置关系的至少一个目标像素点的可用性,包括:The method according to claim 19, wherein the determining the availability of at least one target pixel point having a preset positional relationship with the image block to be processed comprises:确定所述目标像素点所在的图像块的可用性。Determining the availability of the image block where the target pixel point is located.
- 根据权利要求19或20所述的方法,其特征在于,所述候选像素点的位置,包括:The method according to claim 19 or 20, wherein the position of the candidate pixel point comprises:在以所述待处理图像块的左上顶点处的像素点的位置为原点,以所述待处理图像块的上边缘所在的直线为横轴,向右为水平正方向,以所述待处理图像块的左边缘所在的直线为纵轴,向下为竖直正方向的直角坐标系中的以下坐标点中的至少一个:(-1,h×i-1+h),(-1,h×i+h),(-w×i,h×j-1),(-w×i-1,h×j-1),(-w×i,h×j),(-w×i-1,h×j),其中,w和h为预设正整数,i为正整数,j为非负整数。The position of the pixel at the upper left vertex of the image block to be processed is used as the origin, the straight line where the upper edge of the image block to be processed is located is the horizontal axis, and the right is the horizontal positive direction. The line where the left edge of the block is located is the vertical axis and downward is at least one of the following coordinate points in a rectangular coordinate system in the vertical positive direction: (-1, h × i-1 + h), (-1, h × i + h), (-w × i, h × j-1), (-w × i-1, h × j-1), (-w × i, h × j), (-w × i -1, h × j), where w and h are preset positive integers, i is a positive integer, and j is a non-negative integer.
- 根据权利要求21所述的方法,其特征在于,w为所述待处理图像块的宽度,h为所述待处理图像块的高度。The method according to claim 21, wherein w is a width of the image block to be processed, and h is a height of the image block to be processed.
- 根据权利要求21所述的方法,其特征在于,运动矢量场通过对所述待处理图像块所在的图像对应的运动信息矩阵进行采样获得,w为所述运动矢量场的采样宽度间隔,h为所述运动矢量场的采样高度间隔。The method according to claim 21, wherein the motion vector field is obtained by sampling a motion information matrix corresponding to an image in which the image block to be processed is located, w is a sampling width interval of the motion vector field, and h is The sampling height interval of the motion vector field.
- 根据权利要求21至23任一项所述的方法,其特征在于,w×i小于或等于第一阈值。The method according to any one of claims 21 to 23, wherein w × i is less than or equal to a first threshold.
- 根据权利要求24所述的方法,其特征在于,所述第一阈值等于所述待处理图像块所在的编码树单元CTU的宽度,或者,所述第一阈值等于所述CTU的宽度的2倍。The method according to claim 24, wherein the first threshold is equal to a width of a coding tree unit CTU in which the image block to be processed is located, or the first threshold is equal to twice the width of the CTU .
- 根据权利要求19至25任一项所述的方法,其特征在于,所述候选像素点为多个,且所述多个候选像素点可用,所述将可用的所述目标像素点对应的运动信息加入所述待处理图像块的候选运动信息集合,包括:按照预设顺序将多个可用的所述候选像素点对应的运动信息加入所述待处理图像块的候选运动信息集合,其中,当所述在先获取的候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为P,当所述在后获取的候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为Q,P小于或等于Q。The method according to any one of claims 19 to 25, wherein there are a plurality of candidate pixel points and the plurality of candidate pixel points are available, and the motion corresponding to the target pixel points that will be available The information adding to the candidate motion information set of the image block to be processed includes: adding motion information corresponding to a plurality of available candidate pixel points to the candidate motion information set of the image block to be processed in a preset order, where, when When the previously obtained candidate pixel points correspond to the target motion information, the length of the binary representation of the target identification information is P. When the later obtained candidate pixel points correspond to the target motion information, The length of the binary representation of the target identification information is Q, and P is less than or equal to Q.
- 根据权利要求19至26任一项所述的方法,其特征在于,所述目标标识信息的二进制表示包括所述目标标识信息的编码码字。The method according to any one of claims 19 to 26, wherein the binary representation of the target identification information includes an encoded codeword of the target identification information.
- 根据权利要求26或27所述的方法,其特征在于,所述预设顺序包括:从短到长的距离顺序,其中,所述距离为所述候选像素点在所述直角坐标系中的水平坐标绝对值和竖直坐标绝对值之和;或者,从右到左的顺序;或者,从上到下的顺序;或者,从右上到左下的折线型顺序。The method according to claim 26 or 27, wherein the preset order comprises: a short-to-long distance order, wherein the distance is a level of the candidate pixel point in the rectangular coordinate system The sum of the absolute value of the coordinate and the absolute value of the vertical coordinate; or, the order from right to left; or, the order from top to bottom; or the order of the polyline type from top to bottom.
- 根据权利要求19至28任一项所述的方法,其特征在于,候选运动信息集合包括至少两个相同的运动信息。The method according to any one of claims 19 to 28, wherein the candidate motion information set includes at least two identical motion information.
- 根据权利要求19至28任一项所述的方法,其特征在于,所述将可用的所述目标像素点对应的运动信息加入所述待处理图像块的候选运动信息集合,包括:The method according to any one of claims 19 to 28, wherein adding the available motion information corresponding to the target pixel point to the candidate motion information set of the image block to be processed comprises:依次获取所述可用的目标像素点;Sequentially obtaining the available target pixels;确定当前获取的所述可用的目标像素点的运动信息与所述待处理图像块的候选运动信息集合中的运动信息不同;Determining that the currently acquired motion information of the available target pixel point is different from the motion information in the candidate motion information set of the image block to be processed;将所述具有不同运动信息的可用的目标像素点加入所述待处理图像块的候选运动信息集合。The available target pixel points with different motion information are added to a candidate motion information set of the image block to be processed.
- 根据权利要求19至30任一项所述的方法,其特征在于,所述候选运动信息集合中的运动信息的个数小于或等于预设的第二阈值。The method according to any one of claims 19 to 30, wherein the number of motion information in the candidate motion information set is less than or equal to a preset second threshold.
- 根据权利要求19至31任一项所述的方法,其特征在于,所述根据所述目标运动信息,预测所述待处理图像块的运动信息,包括:The method according to any one of claims 19 to 31, wherein the predicting motion information of the image block to be processed according to the target motion information comprises:将所述目标运动信息作为所述待处理图像块的运动信息。Using the target motion information as the motion information of the image block to be processed.
- 根据权利要求19至32任一项所述的方法,其特征在于,所述方法用于解码所述待处理图像块,还包括:The method according to any one of claims 19 to 32, wherein the method is used to decode the image block to be processed, further comprising:解析码流以获得目标运动残差信息;Parse the code stream to obtain the target motion residual information;对应的,所述根据所述目标运动信息,预测所述待处理图像块的运动信息,包括:Correspondingly, predicting the motion information of the image block to be processed according to the target motion information includes:组合所述目标运动信息和所述目标运动残差信息,以获得所述待处理图像块的运动信息。Combining the target motion information and the target motion residual information to obtain motion information of the image block to be processed.
- 根据权利要求33所述的方法,其特征在于,所述获取目标标识信息,包括:The method according to claim 33, wherein the acquiring target identification information comprises:解析所述码流以获得所述目标标识信息。Parse the code stream to obtain the target identification information.
- 根据权利要求19至34任一项所述的方法,其特征在于,所述方法用于编码所述待处理图像块,在所述获取目标标识信息之前,还包括:The method according to any one of claims 19 to 34, wherein the method is configured to encode the image block to be processed, and before the acquiring target identification information, further comprising:确定编码代价最小的目标运动信息和目标运动残差信息的组合;Determine the combination of target motion information and target motion residual information with the least coding cost;对应的,所述获取目标标识信息包括:Correspondingly, the obtaining target identification information includes:获取所述编码代价最小的目标运动信息在所述至少两个目标运动信息中的标识信息。The identification information of the target motion information with the least coding cost in the at least two target motion information is acquired.
- 根据权利要求35所述的方法,其特征在于,还包括:The method according to claim 35, further comprising:编码所述获取的目标标识信息。Encoding the acquired target identification information.
- 根据权利要求35或36所述的方法,其特征在于,还包括:The method according to claim 35 or 36, further comprising:编码所述目标运动残差信息。Encoding the target motion residual information.
- 一种图像块的运动信息的预测装置,其特征在于,包括:A device for predicting motion information of an image block, comprising:获取模块,用于获取与待处理图像块具有预设位置关系的至少两个目标像素点,所述目标像素点包括与所述待处理图像块邻接的第一候选像素点和位于所述待处理图像块的左侧且与所述待处理图像块不邻接的第二候选像素点;An obtaining module, configured to obtain at least two target pixel points having a preset positional relationship with the image block to be processed, where the target pixel points include a first candidate pixel point adjacent to the image block to be processed and the target pixel point located at the to-be-processed A second candidate pixel point on the left side of the image block that is not adjacent to the image block to be processed;索引模块,用于获取目标标识信息,所述目标标识信息用于从所述至少两个目标像素点对应的运动信息中确定目标运动信息,其中,当所述第一候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为N,当所述第二候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为M,N小于或等于M;An indexing module is configured to obtain target identification information, where the target identification information is used to determine target motion information from motion information corresponding to the at least two target pixel points, and when the first candidate pixel point corresponds to the target For motion information, the length of the binary representation of the target identification information is N. When the second candidate pixel corresponds to the target motion information, the length of the binary representation of the target identification information is M, and N is less than or equal to M;计算模块,用于根据所述目标运动信息,预测所述待处理图像块的运动信息。A calculation module is configured to predict the motion information of the image block to be processed according to the target motion information.
- 根据权利要求38所述的装置,其特征在于,所述目标标识信息的二进制表示包括所述目标标识信息的编码码字。The apparatus according to claim 38, wherein the binary representation of the target identification information includes an encoded codeword of the target identification information.
- 根据权利要求38或39所述的装置,其特征在于,所述第二候选像素点的位置,包括:The device according to claim 38 or 39, wherein the position of the second candidate pixel point comprises:在以所述待处理图像块的左上顶点处的像素点的位置为原点,以所述待处理图像块的上边缘所在的直线为横轴,向右为水平正方向,以所述待处理图像块的左边缘所在的直线为纵轴,向下为竖直正方向的直角坐标系中的以下坐标点中的至少一个:(-1,h×i-1+h),(-1,h×i+h),(-w×i,h×j-1),(-w×i-1,h×j-1),(-w×i,h×j),(-w×i-1,h×j),其中,w和h为预设正整数,i为正整数,j为非负整数。The position of the pixel at the upper left vertex of the image block to be processed is used as the origin, the straight line where the upper edge of the image block to be processed is located is the horizontal axis, and the right is the horizontal positive direction. The line where the left edge of the block is located is the vertical axis and downward is at least one of the following coordinate points in a rectangular coordinate system in the vertical positive direction: (-1, h × i-1 + h), (-1, h × i + h), (-w × i, h × j-1), (-w × i-1, h × j-1), (-w × i, h × j), (-w × i -1, h × j), where w and h are preset positive integers, i is a positive integer, and j is a non-negative integer.
- 根据权利要求40所述的装置,其特征在于,w为所述待处理图像块的宽度,h为所述待处理图像块的高度。The apparatus according to claim 40, wherein w is a width of the image block to be processed, and h is a height of the image block to be processed.
- 根据权利要求40所述的装置,其特征在于,运动矢量场通过对所述待处理图像块所在的图像对应的运动信息矩阵进行采样获得,w为所述运动矢量场的采样宽度间隔,h为所述运动矢量场的采样高度间隔。The apparatus according to claim 40, wherein the motion vector field is obtained by sampling a motion information matrix corresponding to an image where the image block to be processed is located, w is a sampling width interval of the motion vector field, and h is The sampling height interval of the motion vector field.
- 根据权利要求40至42任一项所述的装置,其特征在于,w×i小于或等于第一阈值。The device according to any one of claims 40 to 42, wherein w × i is less than or equal to a first threshold.
- 根据权利要求43所述的装置,其特征在于,所述第一阈值等于所述待处理图像块所在的编码树单元CTU的宽度,或者,所述第一阈值等于所述CTU的宽度的2倍。The apparatus according to claim 43, wherein the first threshold is equal to a width of a coding tree unit CTU in which the image block to be processed is located, or the first threshold is equal to twice the width of the CTU .
- 根据权利要求38至44任一项所述的装置,其特征在于,所述第二候选像素点为多个,所述获取模块具体用于:按照所述预设顺序获取所述至少两个目标像素点中的多个第二候选像素点,其中,当所述在先获取的第二候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为P,当所述在后获取的第二候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为Q,P小于或等于Q。The device according to any one of claims 38 to 44, wherein the second candidate pixel point is multiple, and the obtaining module is specifically configured to obtain the at least two targets in the preset order. A plurality of second candidate pixels among the pixels, wherein when the previously obtained second candidate pixel corresponds to the target motion information, the length of the binary representation of the target identification information is P, and when the When the second candidate pixel point obtained later corresponds to the target motion information, the length of the binary representation of the target identification information is Q, and P is less than or equal to Q.
- 根据权利要求45所述的装置,其特征在于,所述预设顺序包括:从短到长的距离顺序,其中,所述距离为所述第二候选像素点在所述直角坐标系中的水平坐标绝对值和竖直坐标绝对值之和;或者,从右到左的顺序;或者,从上到下的顺序;或者,从右上到左下的折线型顺序。The apparatus according to claim 45, wherein the preset order comprises a short-to-long distance order, wherein the distance is a level of the second candidate pixel point in the rectangular coordinate system The sum of the absolute value of the coordinate and the absolute value of the vertical coordinate; or, the order from right to left; or, the order from top to bottom; or the order of the polyline type from top to bottom.
- 根据权利要求38至46任一项所述的装置,其特征在于,所述获取的至少两个目标像素点中,至少两个目标像素点的运动信息相同。The device according to any one of claims 38 to 46, wherein motion information of at least two target pixels among the at least two target pixels obtained is the same.
- 根据权利要求38至47任一项所述的装置,其特征在于,所述获取模块具体用于:The apparatus according to any one of claims 38 to 47, wherein the obtaining module is specifically configured to:依次获取与所述待处理图像块具有所述预设位置关系的候选像素点;Sequentially obtaining candidate pixel points having the preset positional relationship with the image block to be processed;确定当前获取的所述候选像素点的运动信息与已获取的所述目标像素点的运动信息不同;Determining that the currently acquired motion information of the candidate pixel point is different from the acquired motion information of the target pixel point;将所述具有不同运动信息的候选像素点作为所述目标像素点。The candidate pixel points having different motion information are used as the target pixel points.
- 根据权利要求38至48任一项所述的装置,其特征在于,所述获取的目标像素点的个数为预设的第二阈值。The device according to any one of claims 38 to 48, wherein the number of the obtained target pixel points is a preset second threshold.
- 根据权利要求38至49任一项所述的装置,其特征在于,所述计算模块具体用于:The device according to any one of claims 38 to 49, wherein the calculation module is specifically configured to:将所述目标运动信息作为所述待处理图像块的运动信息。Using the target motion information as the motion information of the image block to be processed.
- 根据权利要求38至50任一项所述的装置,其特征在于,所述装置用于解码所述待处理图像块,所述索引模块还用于:The apparatus according to any one of claims 38 to 50, wherein the apparatus is configured to decode the image block to be processed, and the indexing module is further configured to:解析码流以获得目标运动残差信息;Parse the code stream to obtain the target motion residual information;对应的,所述计算模块具体用于:Correspondingly, the calculation module is specifically configured to:组合所述目标运动信息和所述目标运动残差信息,以获得所述待处理图像块的运动信息。Combining the target motion information and the target motion residual information to obtain motion information of the image block to be processed.
- 根据权利要求51所述的装置,其特征在于,所述索引模块具体用于:The apparatus according to claim 51, wherein the indexing module is specifically configured to:解析所述码流以获得所述目标标识信息。Parse the code stream to obtain the target identification information.
- 根据权利要求38至52任一项所述的装置,其特征在于,所述装置用于编码所述待处理图像块,所述获取模块还用于:The apparatus according to any one of claims 38 to 52, wherein the apparatus is configured to encode the image block to be processed, and the obtaining module is further configured to:确定编码代价最小的目标运动信息和目标运动残差信息的组合;Determine the combination of target motion information and target motion residual information with the least coding cost;对应的,所述索引模块具体用于:Correspondingly, the index module is specifically configured to:获取所述编码代价最小的目标运动信息在所述至少两个目标运动信息中的标识信息。The identification information of the target motion information with the least coding cost in the at least two target motion information is acquired.
- 根据权利要求53所述的装置,其特征在于,所述索引模块还用于:The apparatus according to claim 53, wherein the indexing module is further configured to:编码所述获取的目标标识信息。Encoding the acquired target identification information.
- 根据权利要求53或54所述的装置,其特征在于,所述索引模块还用于:The apparatus according to claim 53 or 54, wherein the indexing module is further configured to:编码所述目标运动残差信息。Encoding the target motion residual information.
- 一种图像块的运动信息的预测装置,其特征在于,包括:A device for predicting motion information of an image block, comprising:检测模块,用于确定与待处理图像块具有预设位置关系的至少一个目标像素点的可用性,所述目标像素点包括位于所述待处理图像块的左侧且与所述待处理图像块不邻接的候选像素点,其中,当所述目标像素点所在的图像块的预测模式为帧内预测时,所述目标像素点不可用;A detection module, configured to determine the availability of at least one target pixel point having a preset positional relationship with the image block to be processed, the target pixel point including being located on the left side of the image block to be processed and different from the image block to be processed Adjacent candidate pixels, wherein when the prediction mode of the image block where the target pixel is located is intra prediction, the target pixel is unavailable;获取模块,用于将可用的所述目标像素点对应的运动信息加入所述待处理图像块的候选运动信息集合;An acquisition module, configured to add available motion information corresponding to the target pixel point to a candidate motion information set of the image block to be processed;索引模块,用于获取目标标识信息,所述目标标识信息用于从所述候选运动信息集合中确定目标运动信息;An indexing module, configured to obtain target identification information, where the target identification information is used to determine target motion information from the candidate motion information set;计算模块,用于根据所述目标运动信息,预测所述待处理图像块的运动信息。A calculation module is configured to predict the motion information of the image block to be processed according to the target motion information.
- 根据权利要求56所述的装置,其特征在于,所述检测模块具体用于:The device according to claim 56, wherein the detection module is specifically configured to:确定所述目标像素点所在的图像块的可用性。Determining the availability of the image block where the target pixel point is located.
- 根据权利要求56或57所述的装置,其特征在于,所述候选像素点的位置,包括:The device according to claim 56 or 57, wherein the position of the candidate pixel point comprises:在以所述待处理图像块的左上顶点处的像素点的位置为原点,以所述待处理图像块的上边缘所在的直线为横轴,向右为水平正方向,以所述待处理图像块的左边缘所在的直线为纵轴,向下为竖直正方向的直角坐标系中的以下坐标点中的至少一个:(-1,h× i-1+h),(-1,h×i+h),(-w×i,h×j-1),(-w×i-1,h×j-1),(-w×i,h×j),(-w×i-1,h×j),其中,w和h为预设正整数,i为正整数,j为非负整数。The position of the pixel at the upper left vertex of the image block to be processed is used as the origin, the straight line where the upper edge of the image block to be processed is located is the horizontal axis, and the right is the horizontal positive direction. The line where the left edge of the block is located is the vertical axis and downward is at least one of the following coordinate points in the orthogonal coordinate system in the vertical positive direction: (-1, h × i-1 + h), (-1, h × i + h), (-w × i, h × j-1), (-w × i-1, h × j-1), (-w × i, h × j), (-w × i -1, h × j), where w and h are preset positive integers, i is a positive integer, and j is a non-negative integer.
- 根据权利要求58所述的装置,其特征在于,w为所述待处理图像块的宽度,h为所述待处理图像块的高度。The apparatus according to claim 58, wherein w is a width of the image block to be processed, and h is a height of the image block to be processed.
- 根据权利要求58所述的装置,其特征在于,运动矢量场通过对所述待处理图像块所在的图像对应的运动信息矩阵进行采样获得,w为所述运动矢量场的采样宽度间隔,h为所述运动矢量场的采样高度间隔。The apparatus according to claim 58, wherein the motion vector field is obtained by sampling a motion information matrix corresponding to an image in which the image block to be processed is located, w is a sampling width interval of the motion vector field, and h is The sampling height interval of the motion vector field.
- 根据权利要求58至60任一项所述的装置,其特征在于,w×i小于或等于第一阈值。The device according to any one of claims 58 to 60, wherein w × i is less than or equal to a first threshold.
- 根据权利要求61所述的装置,其特征在于,所述第一阈值等于所述待处理图像块所在的编码树单元CTU的宽度,或者,所述第一阈值等于所述CTU的宽度的2倍。The apparatus according to claim 61, wherein the first threshold is equal to a width of a coding tree unit CTU in which the image block to be processed is located, or the first threshold is equal to twice the width of the CTU .
- 根据权利要求56至62任一项所述的装置,其特征在于,所述候选像素点为多个,且所述多个候选像素点可用,所述获取模块具体用于:按照预设顺序将多个可用的所述候选像素点对应的运动信息加入所述待处理图像块的候选运动信息集合,其中,当所述在先获取的候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为P,当所述在后获取的候选像素点对应所述目标运动信息时,所述目标标识信息的二进制表示的长度为Q,P小于或等于Q。The device according to any one of claims 56 to 62, wherein the candidate pixel points are multiple and the multiple candidate pixel points are available, and the obtaining module is specifically configured to: The available motion information corresponding to the candidate pixel points is added to the candidate motion information set of the image block to be processed, and when the previously obtained candidate pixel points correspond to the target motion information, the target identifier The length of the binary representation of the information is P. When the candidate pixel points obtained later correspond to the target motion information, the length of the binary representation of the target identification information is Q, and P is less than or equal to Q.
- 根据权利要求56至63任一项所述的装置,其特征在于,所述目标标识信息的二进制表示包括所述目标标识信息的编码码字。The device according to any one of claims 56 to 63, wherein the binary representation of the target identification information includes an encoded codeword of the target identification information.
- 根据权利要求63或64所述的装置,其特征在于,所述预设顺序包括:从短到长的距离顺序,其中,所述距离为所述候选像素点在所述直角坐标系中的水平坐标绝对值和竖直坐标绝对值之和;或者,从右到左的顺序;或者,从上到下的顺序;或者,从右上到左下的折线型顺序。The apparatus according to claim 63 or 64, wherein the preset order comprises: a short-to-long distance order, wherein the distance is a level of the candidate pixel point in the rectangular coordinate system The sum of the absolute value of the coordinate and the absolute value of the vertical coordinate; or, the order from right to left; or, the order from top to bottom; or the order of the polyline type from top to bottom.
- 根据权利要求56至65任一项所述的装置,其特征在于,候选运动信息集合包括至少两个相同的运动信息。The apparatus according to any one of claims 56 to 65, wherein the candidate motion information set includes at least two identical motion information.
- 根据权利要求56至65任一项所述的装置,其特征在于,所述获取模块具体用于:The device according to any one of claims 56 to 65, wherein the obtaining module is specifically configured to:依次获取所述可用的目标像素点;Sequentially obtaining the available target pixels;确定当前获取的所述可用的目标像素点的运动信息与所述待处理图像块的候选运动信息集合中的运动信息不同;Determining that the currently acquired motion information of the available target pixel point is different from the motion information in the candidate motion information set of the image block to be processed;将所述具有不同运动信息的可用的目标像素点加入所述待处理图像块的候选运动信息集合。The available target pixel points with different motion information are added to a candidate motion information set of the image block to be processed.
- 根据权利要求56至67任一项所述的装置,其特征在于,所述候选运动信息集合中的运动信息的个数小于或等于预设的第二阈值。The device according to any one of claims 56 to 67, wherein the number of motion information in the candidate motion information set is less than or equal to a preset second threshold.
- 根据权利要求56至68任一项所述的装置,其特征在于,所述计算模块具体用于:The device according to any one of claims 56 to 68, wherein the calculation module is specifically configured to:将所述目标运动信息作为所述待处理图像块的运动信息。Using the target motion information as the motion information of the image block to be processed.
- 根据权利要求56至69任一项所述的装置,其特征在于,所述装置用于解码所述待处理图像块,所述索引模块还用于:The apparatus according to any one of claims 56 to 69, wherein the apparatus is configured to decode the image block to be processed, and the indexing module is further configured to:解析码流以获得目标运动残差信息;Parse the code stream to obtain the target motion residual information;对应的,所述计算模块具体用于:Correspondingly, the calculation module is specifically configured to:组合所述目标运动信息和所述目标运动残差信息,以获得所述待处理图像块的运动信息。Combining the target motion information and the target motion residual information to obtain motion information of the image block to be processed.
- 根据权利要求70所述的装置,其特征在于,所述索引模块具体用于:The apparatus according to claim 70, wherein the indexing module is specifically configured to:解析所述码流以获得所述目标标识信息。Parse the code stream to obtain the target identification information.
- 根据权利要求56至71任一项所述的装置,其特征在于,所述装置用于编码所述待处理图像块,所述获取模块还用于:The apparatus according to any one of claims 56 to 71, wherein the apparatus is configured to encode the image block to be processed, and the obtaining module is further configured to:确定编码代价最小的目标运动信息和目标运动残差信息的组合;Determine the combination of target motion information and target motion residual information with the least coding cost;对应的,所述索引模块具体用于:Correspondingly, the index module is specifically configured to:获取所述编码代价最小的目标运动信息在所述至少两个目标运动信息中的标识信息。The identification information of the target motion information with the least coding cost in the at least two target motion information is acquired.
- 根据权利要求72所述的装置,其特征在于,所述索引模块还用于:The apparatus according to claim 72, wherein the indexing module is further configured to:编码所述获取的目标标识信息。Encoding the acquired target identification information.
- 根据权利要求72或73所述的装置,其特征在于,所述索引模块还用于:The apparatus according to claim 72 or 73, wherein the index module is further configured to:编码所述目标运动残差信息。Encoding the target motion residual information.
- 一种图像块的运动信息的预测设备,其特征在于,包括:A device for predicting motion information of an image block, comprising:处理器和耦合于所述处理器的存储器;A processor and a memory coupled to the processor;所述处理器用于执行权利要求1至55任一项所述的方法。The processor is configured to execute the method according to any one of claims 1 to 55.
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,当所述指令在计算机上运行时,使得计算机执行权利要求1至55任一项所述的方法。A computer-readable storage medium, characterized in that instructions are stored in the computer-readable storage medium, and when the instructions are run on a computer, the computer is caused to execute the method according to any one of claims 1 to 55.
- 一种包含指令的计算机程序产品,其特征在于,当所述指令在计算机上运行时,使得计算机执行权利要求1至55任一项所述的方法。A computer program product containing instructions, wherein when the instructions are run on a computer, the computer is caused to execute the method according to any one of claims 1 to 55.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2018/098581 WO2020024275A1 (en) | 2018-08-03 | 2018-08-03 | Inter-frame prediction method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2018/098581 WO2020024275A1 (en) | 2018-08-03 | 2018-08-03 | Inter-frame prediction method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020024275A1 true WO2020024275A1 (en) | 2020-02-06 |
Family
ID=69232328
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/098581 WO2020024275A1 (en) | 2018-08-03 | 2018-08-03 | Inter-frame prediction method and device |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2020024275A1 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101605256A (en) * | 2008-06-12 | 2009-12-16 | 华为技术有限公司 | A kind of method of coding and decoding video and device |
CN104054350A (en) * | 2011-11-04 | 2014-09-17 | 诺基亚公司 | Method for video coding and an apparatus |
CN104539966A (en) * | 2014-09-30 | 2015-04-22 | 华为技术有限公司 | Image prediction method and relevant device |
-
2018
- 2018-08-03 WO PCT/CN2018/098581 patent/WO2020024275A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101605256A (en) * | 2008-06-12 | 2009-12-16 | 华为技术有限公司 | A kind of method of coding and decoding video and device |
CN104054350A (en) * | 2011-11-04 | 2014-09-17 | 诺基亚公司 | Method for video coding and an apparatus |
CN104539966A (en) * | 2014-09-30 | 2015-04-22 | 华为技术有限公司 | Image prediction method and relevant device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7211816B2 (en) | Intra-block copy-merge mode and padding for unavailable IBC reference areas | |
TWI846773B (en) | Triangle motion information for video coding | |
TWI843809B (en) | Signalling for merge mode with motion vector differences in video coding | |
CN109996081B (en) | Image prediction method, device and coder-decoder | |
JP2019508971A (en) | Predicting filter coefficients from fixed filters for video coding | |
JP2019519141A (en) | Signaling of filtering information | |
KR20130126688A (en) | Motion vector prediction | |
CN115361564B (en) | Motion vector acquisition method, device, equipment and computer readable storage medium | |
CN111200735B (en) | Inter-frame prediction method and device | |
WO2020048180A1 (en) | Motion vector acquisition method, device, computer equipment and storage medium | |
US20210203944A1 (en) | Decoding method and decoding apparatus for predicting motion information | |
KR20230098705A (en) | Sub-block temporal motion vector prediction for video coding | |
CN113170141B (en) | Inter-frame prediction method and related device | |
TW201921938A (en) | Adaptive GOP structure with future reference frame in random access configuration for video coding | |
CN111919439B (en) | Method and device for acquiring motion vector, method and device for constructing motion vector set and computer-readable storage medium | |
CN110546956B (en) | Inter-frame prediction method and device | |
CN110876057B (en) | Inter-frame prediction method and device | |
WO2020024275A1 (en) | Inter-frame prediction method and device | |
CN110855993A (en) | Method and device for predicting motion information of image block | |
WO2020038232A1 (en) | Method and apparatus for predicting movement information of image block | |
WO2020052653A1 (en) | Decoding method and device for predicted motion information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18928772 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18928772 Country of ref document: EP Kind code of ref document: A1 |