CN114930836A - Method and apparatus for coordinated weighted prediction using non-rectangular fusion mode - Google Patents

Method and apparatus for coordinated weighted prediction using non-rectangular fusion mode Download PDF

Info

Publication number
CN114930836A
CN114930836A CN202180008825.6A CN202180008825A CN114930836A CN 114930836 A CN114930836 A CN 114930836A CN 202180008825 A CN202180008825 A CN 202180008825A CN 114930836 A CN114930836 A CN 114930836A
Authority
CN
China
Prior art keywords
value
slice
indicator
equal
picture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202180008825.6A
Other languages
Chinese (zh)
Other versions
CN114930836B (en
Inventor
阿列克谢·康斯坦丁诺维奇·菲利波夫
陈焕浜
瓦西里·亚历斯维奇·拉夫特斯基
杨海涛
伊蕾娜·亚历山德罗夫娜·阿尔希娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202211586724.XA priority Critical patent/CN115988219B/en
Publication of CN114930836A publication Critical patent/CN114930836A/en
Application granted granted Critical
Publication of CN114930836B publication Critical patent/CN114930836B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/174Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a slice, e.g. a line of blocks or a group of blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/184Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/537Motion estimation other than block-based
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/577Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Abstract

There is provided a decoding method including: acquiring a code stream of a current image; obtaining a value of a first indicator of the current image according to the code stream, wherein the first indicator indicates a stripe type; obtaining a value of a second indicator of the current image according to the code stream, wherein the second indicator indicates whether a weighted prediction parameter exists in an image header or a slice header of the code stream; when the value of the first indicator is equal to a first preset value and the value of the second indicator is equal to a second preset value, analyzing the value of a weighted prediction parameter of a current block of a current stripe of the current image from the code stream, wherein the first preset value is an integer value and the second preset value is an integer value; predicting the current block according to the value of the weighted prediction parameter.

Description

Method and apparatus for coordinated weighted prediction using non-rectangular fusion mode
Cross reference to related applications
This patent application claims priority from US62/960,134 filed on 12/1/2020. The entire disclosure of the above-mentioned patent application is incorporated by reference into the present application.
Technical Field
Embodiments of the present application relate generally to the field of motion image processing, and more particularly to non-rectangular partition modes when combined with weighted prediction for coding fade types.
Background
Video coding (video encoding and decoding) is widely used in digital video applications such as broadcast digital Television (TV), real-time session applications such as internet and mobile network based video transmission, video chat, video conferencing, DVD and blu-ray discs, video content acquisition and editing systems, and camcorders for security applications.
Even relatively short videos require large amounts of video data to describe, which may cause difficulties in streaming or otherwise transmitting the data in communication networks with limited bandwidth capacity. Therefore, video data typically needs to be compressed and then transmitted over modern telecommunication networks. Since memory resources may be limited, the size of the video may also be a problem when storing the video in a storage device. Video compression devices typically use software and/or hardware on the source side to decode the video data for transmission or storage, thereby reducing the amount of data required to represent digital video images. Then, the video decompression apparatus that decodes the video data receives the compressed data at the destination side. With limited network resources and an increasing demand for higher video quality, there is a need for improved compression and decompression techniques that can increase the compression ratio with little impact on image quality.
In particular, when coding an image showing a luminance change, weighted prediction may be advantageous for motion compensation of the context in which inter prediction is performed. Non-rectangular partitions such as triangular partitioning/merge mode (TPM) and Geometric (GEO) motion/merge partitions, etc. may be used to handle various types of motion in a manner that may be better than inter-frame prediction that is limited to only rectangular partitions. However, for codecs that support weighted prediction and non-rectangular partitioning, some coordination needs to be done.
Disclosure of Invention
Embodiments provide methods for coding a video sequence including a weighted prediction parameter that is a combination of a fade weighting parameter and a mix weighting parameter. The value of the fade weighting parameter is determined from the reference index value and the reference picture list, and the mix weighting parameter is determined from the position of the prediction sample in the prediction block.
The above object and other objects are achieved by the subject matter claimed in the independent claims. Other implementations in combination with each other are apparent from the dependent claims, the description and the drawings.
In a first aspect, a decoding method is provided, where the method is implemented by a decoding device and includes the following steps:
obtaining a codestream of a current image (e.g., an encoded video sequence);
obtaining (e.g., by parsing a corresponding syntax element included in the codestream) a value of a first indicator of the current picture from the codestream, wherein the first indicator indicates a slice type (of a slice of the current picture);
obtaining (e.g., by parsing a corresponding syntax element included in the codestream) a value of a second indicator of the current picture from the codestream, wherein the second indicator indicates whether a weighted prediction parameter is present in a picture header or a slice header of the codestream;
when the value of the first indicator is equal to a first preset value (e.g., 1) and the value of the second indicator is equal to a second preset value (e.g., 2), parsing the value of a weighted prediction parameter of a current block from the code stream, wherein the current block is included in a current stripe of the current image, the first preset value is an integer value, and the second preset value is an integer value;
the current block is predicted (a prediction block is generated for the current block) according to the value of the weighted prediction parameter.
In this way, a value of a weighted prediction parameter (only) parsed out for a specific slice type indicated by the value of the first indicator may be used according to a value of a second indicator valid for the entire current picture, and whether to perform inter prediction on a current block of the current slice through weighted prediction when the second indicator indicates that the weighted prediction parameter exists in the picture header. The indication (signaling) at the higher level, i.e. at the picture header level, allows to explicitly decide whether or not to use weighted prediction for blocks (in particular all blocks) of a slice (in particular all slices) of the current picture. If weighted prediction is used, no non-rectangular partitioning may be allowed.
In one implementation, the value of the first indicator is obtained from (e.g., parsed from) a picture header included in the codestream. In another implementation, the value of the second indicator is obtained from (e.g., parsed from) a picture parameter set included in the codestream. In still another implementation manner, the value of the weighted prediction parameter is parsed from an image header included in the code stream. In this way, all relevant values under consideration (or syntax elements with these values) can be obtained by higher level indications (i.e. in the picture header or picture parameter set). Therefore, the use of weighted inter prediction can be efficiently indicated.
In one implementation, the value of the first indicator is equal to the first preset value, indicating that the slice type of at least one slice included in the current picture is an inter-slice, e.g., includes or is a B-slice or a P-slice.
In a second aspect, a decoding method is provided, where the method is implemented by a decoding device and includes the following steps:
obtaining a codestream of a current image (e.g., an encoded video sequence);
obtaining (e.g., by parsing a corresponding syntax element included in the codestream) a value of a first indicator of the current picture from the codestream, wherein the first indicator indicates a slice type (of a slice of the current picture);
obtaining (e.g., by parsing a corresponding syntax element included in the codestream) a value of a second indicator of the current picture from the codestream, wherein the second indicator indicates whether a weighted prediction parameter is present in a picture header or a slice header of the codestream;
obtaining (e.g., by parsing a corresponding syntax element included in the code stream) a value of a third indicator of the current picture according to the code stream, where the third indicator indicates whether weighted prediction is applicable to an inter-slice whose slice type is a B-slice or a P-slice;
when the value of the first indicator is equal to a first preset value (e.g., 1), the value of the second indicator is equal to a second preset value (e.g., 1), and the value of the third indicator indicates that weighted prediction is applicable to the inter-slice, parsing the value of a weighted prediction parameter of a current block from the code stream, wherein the current block is included in a current slice of the current picture, the first preset value is an integer value, and the second preset value is an integer value;
predicting the current block according to the value of the weighted prediction parameter (generating a prediction block for the current block).
According to the method of the second aspect, in various alternative implementations (which may be combined with each other), the value of the first indicator is obtained from (e.g., parsed from) a picture header included in the codestream, or the value of the second indicator is obtained from (e.g., parsed from) a picture parameter set included in the codestream, or the value of the weighted prediction parameter is obtained from (e.g., parsed from) a picture header included in the codestream, or the value of the third indicator is obtained from (e.g., parsed from) a picture parameter set included in the codestream.
The method according to the second aspect, in an implementation, a value of the first indicator is equal to the first preset value, indicating that a slice type of at least one slice included in the current picture is an inter-slice, e.g., includes or is a B-slice or a P-slice.
The method according to the second aspect, in one implementation, the third indicator has a value of 1, indicating that weighted prediction is applicable to the inter-slice.
The method of the second aspect and its implementation achieve similar or identical advantages as the method of the first aspect and its implementation.
Furthermore, a decoder is provided, wherein the decoder comprises processing circuitry for performing the method according to any of the above embodiments; there is also provided a computer program product, wherein the computer program product comprises program code for performing the method according to any of the above embodiments, when the computer program product is executed by a computer or a processor.
Furthermore, a decoder is provided, wherein the decoder comprises: one or more processors; a non-transitory computer readable storage medium, wherein the non-transitory computer readable storage medium is coupled with the processor and stores a program for execution by the processor, the program, when executed by the processor, causing the decoder to perform the method according to any of the above embodiments; a non-transitory computer readable medium is also provided, wherein the non-transitory computer readable medium carries program code which, when executed by a computer device, causes the computer device to perform the method of any of the above embodiments.
In addition, in another aspect, a non-transitory storage medium is provided, where the non-transitory storage medium includes a codestream decoded by the method in any of the foregoing embodiments.
In another aspect, an encoded bitstream for a video signal is provided, wherein the bitstream comprises a first indicator for a current picture indicating a slice type (of a slice of the current picture) and a second indicator for the current picture indicating whether a weighted prediction parameter is present in a picture header or a slice header of the bitstream. Specifically, when the value of the second indicator is equal to a second preset value (e.g., 2), the code stream further includes a weighted prediction parameter of a current block, where the current block is included in a current slice of the current picture, and the second preset value is an integer value.
In an implementation, the first indicator is included in a picture header, and/or the second indicator is included in a picture parameter set, and/or the weighted prediction parameter is included in a picture header included in the codestream.
In one implementation, the value of the first indicator is equal to the first preset value, indicating that the slice type of at least one slice included in the current picture is an inter-slice, e.g., includes or is a B-slice or a P-slice.
In another aspect, an encoded bitstream for a video signal is provided, wherein the bitstream includes a first indicator for a current picture, a second indicator for the current picture, and a third indicator for the current picture, the first indicator indicating a slice type (of a slice of the current picture), the second indicator indicating whether a weighted prediction parameter exists in a picture header or a slice header of the bitstream, and the third indicator indicating whether weighted prediction is applicable to an inter slice whose slice type is a B slice or a P slice. In particular, when the value of the second indicator is equal to a second preset value (for example, 1), the code stream further includes a weighted prediction parameter, wherein the current block is included in a current slice of the current picture, and the second preset value is an integer value.
In various alternative implementations, the first indicator is included in a picture header, and/or the second indicator is included in a picture parameter set, and/or the weighted prediction parameter is included in a picture header, and/or the third indicator is included in a picture parameter set included in the codestream.
In one implementation, the value of the first indicator is equal to the first preset value, indicating that the slice type of at least one slice included in the current picture is an inter-slice, e.g., includes or is a B-slice or a P-slice; and/or the third indicator has a value of 1, indicating that weighted prediction is applicable to the inter slice.
Furthermore, in another aspect, a non-transitory storage medium is provided, wherein the non-transitory storage medium includes an encoded codestream decoded (or to be decoded) by an image decoding apparatus, the codestream being generated by dividing a video signal or a frame of an image signal into a plurality of blocks, and the codestream including a first indicator for a current image and a second indicator for the current image, wherein the first indicator indicates a slice type (of a slice of the current image), and the second indicator indicates whether a weighted prediction parameter is present in a picture header or a slice header of the codestream. Specifically, when the value of the second indicator is equal to a second preset value (e.g., 2), the code stream further includes a weighted prediction parameter of a current block, where the current block is included in a current slice of the current picture, and the second preset value is an integer value.
In an implementation, the first indicator is included in a picture header, and/or the second indicator is included in a picture parameter set, and/or the weighted prediction parameter is included in a picture header included in the codestream.
In one implementation, the value of the first indicator is equal to the first preset value, indicating that the slice type of at least one slice included in the current picture is an inter-slice, e.g., includes or is a B-slice or a P-slice.
Further, in another aspect, there is provided a non-transitory storage medium, wherein the non-transitory storage medium includes an encoded codestream decoded (or to be decoded) by an image decoding apparatus, the codestream being generated by dividing a frame of a video signal or an image signal into a plurality of blocks, and the codestream includes a first indicator for a current picture, a second indicator for the current picture, and a third indicator for the current picture, wherein the first indicator indicates a slice type (of a slice of the current picture), the second indicator indicates whether a weighted prediction parameter is present in a picture header or a slice header of the codestream, the third indicator indicates whether weighted prediction is applicable to an inter slice, a slice type of the inter slice being a B slice or a P slice. In particular, when the value of the second indicator is equal to a second preset value (for example, 1), the code stream further includes a weighted prediction parameter, wherein the current block is included in a current slice of the current picture, and the second preset value is an integer value.
In various alternative implementations, the first indicator is included in a picture header, and/or the second indicator is included in a picture parameter set, and/or the weighted prediction parameter is included in a picture header, and/or the third indicator is included in a picture parameter set included in the codestream.
In one implementation, the value of the first indicator is equal to the first preset value, indicating that the slice type of at least one slice included in the current picture is an inter-slice, e.g., includes or is a B-slice or a P-slice; and/or the third indicator has a value of 1, indicating that weighted prediction is applicable to the inter slice.
All the above methods can be implemented by the decoding device described below and these devices can be used to perform the flow steps in the corresponding methods described above and to achieve the same advantages as set forth above. The following devices are provided:
a decoding device (for decoding an encoded video sequence), wherein the decoding device comprises:
a code stream obtaining unit, configured to obtain a code stream of a current image;
an indicator value obtaining unit operable to:
(a) obtaining a value of a first indicator of the current image according to the code stream, wherein the first indicator indicates a stripe type;
(b) obtaining a value of a second indicator of the current image according to the code stream, wherein the second indicator indicates whether a weighted prediction parameter exists in an image header or a stripe header of the code stream;
an analysis unit configured to: when the value of the first indicator is equal to a first preset value (e.g., 1) and the value of the second indicator is equal to a second preset value (e.g., 1), parsing the value of a weighted prediction parameter of a current block from the code stream, wherein the current block is included in a current stripe of the current image, the first preset value is an integer value, and the second preset value is an integer value;
a prediction unit for predicting the current block according to the value of the weighted prediction parameter.
In the decoding device, the indicator value obtaining unit may be configured to obtain the value of the first indicator from a picture header included in the code stream or obtain the value of the second indicator from a picture parameter set included in the code stream; alternatively, the parsing unit may be configured to parse the value of the weighted prediction parameter from an image header included in the codestream.
The value of the first indicator is equal to the first preset value, and may indicate that the slice type of at least one slice included in the current picture is an inter-slice, for example, includes or is a B-slice or a P-slice.
Furthermore, a decoding device (for decoding an encoded video sequence) is provided, wherein the decoding device comprises:
a code stream obtaining unit, configured to obtain a code stream of a current image;
an indicator value obtaining unit configured to:
(a) obtaining a value of a first indicator of the current image according to the code stream, wherein the first indicator indicates a stripe type;
(b) obtaining a value of a second indicator of the current image according to the code stream, wherein the second indicator indicates whether a weighted prediction parameter exists in an image header or a stripe header of the code stream;
(c) obtaining a value of a third indicator of the current image according to the code stream, wherein the third indicator indicates whether weighted prediction is applicable to an inter-frame slice, and the type of the inter-frame slice is a B slice or a P slice;
an analysis unit configured to: when the value of the first indicator is equal to a first preset value (e.g., 1), the value of the second indicator is equal to a second preset value (e.g., 1), and the value of the third indicator indicates that weighted prediction is applicable to the inter-slice, parsing the value of a weighted prediction parameter of a current block from the code stream, wherein the current block is included in a current slice of the current picture, the first preset value is an integer value, and the second preset value is an integer value;
a prediction unit for predicting the current block according to the value of the weighted prediction parameter.
In the decoding apparatus, the indicator value obtaining unit may be configured to obtain a value of the first indicator according to a picture header included in the code stream, or obtain a value of the second indicator according to a picture parameter set included in the code stream, or obtain a value of the third indicator according to a picture parameter set included in the code stream; or the parsing unit may be configured to parse the weighted prediction parameters from an image header included in the codestream.
Also, the value of the first indicator, which is equal to the first preset value, may indicate that the slice type of at least one slice included in the current picture is an inter-slice, e.g., includes or is a B-slice or a P-slice.
The third indicator has a value of 1, which may indicate that weighted prediction is applicable to the inter slice.
The details of one or more embodiments are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.
Drawings
Embodiments of the present invention are described in more detail below with reference to the accompanying drawings.
FIG. 1A is a block diagram of one example of a video coding system for implementing an embodiment of the present invention;
FIG. 1B is a block diagram of another example of a video coding system for implementing an embodiment of the present invention;
FIG. 2 is a block diagram of one example of a video encoder for implementing an embodiment of the present invention;
FIG. 3 is a block diagram of one exemplary architecture of a video decoder for implementing an embodiment of the present invention;
fig. 4 is a block diagram of one example of an encoding apparatus or a decoding apparatus;
fig. 5 is a block diagram of another example of an encoding apparatus or a decoding apparatus;
FIG. 6 is a flow chart of weighted prediction encoder-side decision making and parameter estimation;
FIG. 7 illustrates an example of a triangular prediction mode;
FIG. 8 illustrates an example of geometric prediction modes;
fig. 9 shows another example of geometric prediction modes;
fig. 10 is a block diagram of an exemplary structure of a content providing system 3100 implementing a content distribution service;
FIG. 11 is a block diagram of an exemplary architecture of a terminal device;
FIG. 12 is a block diagram of an example of an inter prediction method provided herein;
fig. 13 is a block diagram of an example of an inter prediction apparatus provided in the present application;
fig. 14 is a block diagram of another example of an inter prediction apparatus provided in the present application;
FIG. 15 is a flowchart of a decoding method according to an embodiment of the present invention;
FIGS. 16a and 16b are flow diagrams of a decoding method according to another embodiment;
fig. 17 shows a decoding device according to an embodiment.
In the following, the same reference numerals will refer to the same features or at least functionally equivalent features, if not explicitly stated otherwise.
Detailed Description
In the following description, reference is made to the accompanying drawings which form a part hereof and which show by way of illustration specific aspects of embodiments of the invention or which may be used in the practice of the invention. It is to be understood that embodiments of the present invention may be used in other respects, and may include structural or logical changes not depicted in the figures. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims.
For example, it should be understood that the disclosure relating to the described method is equally applicable to the corresponding device or system for performing the method, and vice versa. For example, if one or more specific method steps are described, the corresponding apparatus may include one or more units (e.g., functional units) to perform the described one or more method steps (e.g., one unit performs one or more steps, or multiple units perform one or more steps, respectively), even if such one or more units are not explicitly described or shown in the figures. On the other hand, for example, if a particular apparatus is described in terms of one or more units (e.g., functional units), the corresponding method may include one or more steps to implement the functionality of the one or more units (e.g., one step implements the functionality of the one or more units, or multiple steps implement the functionality of one or more units of the plurality of units, respectively), even if such one or more steps are not explicitly described or illustrated in the figures. Furthermore, it should be understood that features of the various exemplary embodiments and/or aspects described herein may be combined with each other, unless explicitly stated otherwise.
Video coding generally refers to the processing of a sequence of images that make up a video or video sequence. In the field of video coding, the terms "frame" and "picture" may be used as synonyms. Video coding (or coding in general) includes both video encoding and video decoding. Video encoding is performed on the source side, typically including processing (e.g., by compressing) the original video image to reduce the amount of data required to represent the video image (thereby enabling more efficient storage and/or transmission). Video decoding is performed at the destination side, typically involving the inverse process with respect to the encoder, for reconstructing the video image. Embodiments refer to "coding" of video images (or generally referred to as pictures) to be understood as "encoding" or "decoding" of a video image or corresponding video sequence. The encoding portion and the decoding portion are also collectively referred to as a CODEC (coding and decoding, CODEC).
In the case of lossless video coding, the original video image can be reconstructed, i.e., the reconstructed video image has the same quality as the original video image (assuming no transmission loss or other data loss during storage or transmission). In the case of lossy video coding, further compression is performed by quantization or the like to reduce the amount of data required to represent the video image, whereas the decoder side cannot reconstruct the video image completely, i.e. the quality of the reconstructed video image is lower or worse compared to the quality of the original video image.
Several video coding standards belong to the group of "lossy hybrid video codecs" (i.e., spatial and temporal prediction in the sample domain is combined with 2D transform coding for quantization in the transform domain). Each picture of a video sequence is typically partitioned into a set of non-overlapping blocks, and coding is typically performed at the block level. In other words, the encoder side typically processes, i.e. encodes, the video at the block (video block) level, e.g. generates a prediction block by spatial (intra) prediction and/or temporal (inter) prediction, subtracts the prediction block from the current block (currently processed/block to be processed) to obtain a residual block, transforms the residual block in the transform domain and quantizes the residual block to reduce the amount of data to be transmitted (compressed); while the decoder side applies the inverse process with respect to the encoder to the encoded or compressed block to reconstruct the current block for representation. Furthermore, the processing loop of the encoder is the same as the processing loop of the decoder, such that the encoder and decoder generate the same prediction (e.g., intra prediction and inter prediction) and/or reconstruction for processing, i.e., decoding, subsequent blocks.
In the embodiments of video coding system 10 described below, video encoder 20 and video decoder 30 are described based on fig. 1-3.
Fig. 1A is a schematic block diagram of an exemplary coding system 10, for example, video coding system 10 (or simply coding system 10) may utilize the techniques of the present application. Video encoder 20 (or simply encoder 20) and video decoder 30 (or simply decoder 30) in video coding system 10 represent examples of devices that may be used to perform various techniques according to various examples described in this application.
As shown in fig. 1A, the transcoding system 10 includes a source device 12, the source device 12 being configured to provide encoded image data 21 to a destination device 14, or the like, for decoding the encoded image data 21.
Source device 12 includes an encoder 20 and may additionally (i.e., optionally) include an image source 16, a pre-processor (or pre-processing unit) 18 (e.g., image pre-processor 18), and a communication interface or unit 22.
Image sources 16 may include or may be any type of image capture device, such as a video camera for capturing real-world images; and/or any type of image generation device, such as a computer graphics processor for generating computer animated images; or any type of other device for obtaining and/or providing real-world images, computer-animated images (e.g., screen content, Virtual Reality (VR) images), and/or any combination thereof (e.g., Augmented Reality (AR) images). The image source may be any type of memory (storage) that stores any of the above-described images.
To distinguish between the processing performed by preprocessor 18 and preprocessing unit 18, image or image data 17 may also be referred to as original image or original image data 17.
Preprocessor 18 is to receive (raw) image data 17 and perform preprocessing on image data 17 to obtain a preprocessed image 19 or preprocessed image data 19. The pre-processing performed by pre-processor 18 may include pruning (trimming), color format conversion (e.g., from RGB to YCbCr), toning or denoising, etc. It should be understood that the pre-processing unit 18 may be an optional component.
Video encoder 20 is operative to receive pre-processed image data 19 and provide encoded image data 21 (more details are described below in connection with fig. 2, etc.).
Communication interface 22 in source device 12 may be used to receive encoded image data 21 and send encoded image data 21 (or data resulting from further processing of encoded image data 21) over communication channel 13 to another device (e.g., destination device 14) or any other device for storage or direct reconstruction.
Destination device 14 includes a decoder 30 (e.g., video decoder 30), and may additionally (i.e., optionally) include a communication interface or communication unit 28, a post-processor 32 (or post-processing unit 32), and a display device 34.
Communication interface 28 in destination device 14 is used to receive encoded image data 21 (or data resulting from further processing of encoded image data 21) directly from source device 12 or from any other source device, such as an encoded image data storage device, and to provide encoded image data 21 to decoder 30.
Communication interface 22 and communication interface 28 may be used to send or receive encoded image data 21 or encoded data 21 via a direct communication link (e.g., a direct wired or wireless connection) between source device 12 and destination device 14 or via any type of network (e.g., a wired network, a wireless network, or any combination thereof, or any type of private and public network, or any type of combination thereof).
For example, communication interface 22 may be used to encapsulate encoded image data 21 into a suitable format (e.g., data packets) and/or process encoded image data by any type of transport encoding or processing for transmission over a communication link or communication network.
For example, communication interface 28, which corresponds to communication interface 22, may be used to receive transmitted data and process the transmitted data by any type of corresponding transmission decoding or processing and/or de-encapsulation, resulting in encoded image data 21.
Both communication interface 22 and communication interface 28 may be configured as unidirectional communication interfaces, as indicated by the arrows of communication channel 13 pointing from source device 12 to destination device 14 in fig. 1A, or as bidirectional communication interfaces, and may be used to send and receive messages, etc., to establish a connection, acknowledge and exchange any other information related to a communication link and/or data transmission (e.g., an encoded image data transmission), etc.
Decoder 30 is used to receive encoded image data 21 and provide decoded image data 31 or decoded image 31 (more details will be described below in connection with fig. 3 or 5, etc.).
Post-processor 32 of destination device 14 is to post-process decoded image data 31 (also referred to as reconstructed image data) (e.g., decoded image 31) to obtain post-processed image data 33 (e.g., post-processed image 33). For example, post-processing performed by post-processing unit 32 may include color format conversion (e.g., from YCbCr to RGB), toning, cropping, resampling, or any other processing for providing decoded image data 31 for display by display device 34 or the like, among other things.
A display device 34 in the destination device 14 is used to receive the post-processed image data 33 for displaying images to a user or viewer. The display device 34 may be or may include any type of display for representing the reconstructed image, such as an integrated or external display or screen. For example, the display may include a Liquid Crystal Display (LCD), an Organic Light Emitting Diode (OLED) display, a plasma display, a projector, a micro LED display, a liquid crystal on silicon (LCoS) display, a Digital Light Processor (DLP), or any other type of display.
Although source device 12 and destination device 14 are shown as separate devices in fig. 1A, device embodiments may also include both devices or both functions, i.e., both source device 12 or its corresponding function and destination device 14 or its corresponding function. In these embodiments, the source device 12 or its corresponding functionality and the destination device 14 or its corresponding functionality may be implemented using the same hardware and/or software, or using separate hardware and/or software, or any combination thereof.
It will be apparent to those skilled in the art from the foregoing description that the existence and (exact) functional division of the different elements or functions of the source device 12 and/or destination device 14 shown in fig. 1A may vary depending on the actual device and application.
Encoder 20 (e.g., video encoder 20) or decoder 30 (e.g., video decoder 30) or both encoder 20 and decoder 30 may be implemented by processing circuitry as shown in fig. 1B, such as one or more microprocessors, Digital Signal Processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), discrete logic, hardware, a video-encoding-dedicated processor, or any combination thereof. Encoder 20 may be implemented by processing circuitry 46 to embody the various modules described with reference to encoder 20 of fig. 2 and/or any other encoder system or subsystem described herein. Decoder 30 may be implemented by processing circuitry 46 to embody the various modules described with reference to decoder 30 of fig. 3 and/or any other decoder system or subsystem described herein. The processing circuitry may be used to perform various operations discussed below. If the techniques are implemented in part in software, as shown in FIG. 5, the device may store the instructions of the software in a suitable non-transitory computer-readable storage medium and the instructions may be executed in hardware by one or more processors to implement the techniques of this disclosure. Either of video encoder 20 and video decoder 30 may be integrated as part of a combined encoder/decoder (CODEC) in a single device, as shown in fig. 1B.
Source device 12 and destination device 14 may comprise any of a variety of devices, including any type of handheld or fixed device, such as a notebook or laptop computer, a cell phone, a smartphone, a tablet computer, a camcorder, a desktop computer, a set-top box, a television, a display device, a digital media player, a video game player, a video streaming device (e.g., a content service server or a content distribution server), a broadcast receiver device, a broadcast transmitter device, etc., and may not use or use any type of operating system. In some cases, source device 12 and destination device 14 may be equipped for wireless communication. Thus, source device 12 and destination device 14 may be wireless communication devices.
In some cases, the video coding system 10 shown in fig. 1A is merely exemplary, and the techniques of this application may be applicable to video coding settings (e.g., video encoding or video decoding) that do not necessarily include any data communication between the encoding device and the decoding device. In other examples, the data is retrieved from local storage, streamed over a network, and so forth. A video encoding device may encode and store data into a memory, and/or a video decoding device may retrieve and decode data from a memory. In some examples, the encoding and decoding are performed by devices that do not communicate with each other, but simply encode data to memory and/or retrieve data from memory and decode the data.
For ease of description, embodiments of the present invention are described herein with reference to High-Efficiency Video Coding (HEVC) developed by Joint Video Coding Team (JCT-VC) of the Video Coding Experts Group (VCEG) and the MPEG (Motion Picture Experts Group) of the ITU-T Video Coding Experts Group, or the VVC (Next Generation Video Coding Standard) reference software, and the like. Those of ordinary skill in the art will appreciate that embodiments of the present invention are not limited to HEVC or VVC.
Encoder and encoding method
Fig. 2 is a schematic block diagram of an exemplary video encoder 20 for implementing the techniques of this application. In the example of fig. 2, the video encoder 20 includes an input 201 (or input interface 201), a residual calculation unit 204, a transform processing unit 206, a quantization unit 208, an inverse quantization unit 210, an inverse transform processing unit 212, a reconstruction unit 214, a loop filter unit 220, a Decoded Picture Buffer (DPB) 230, a mode selection unit 260, an entropy encoding unit 270, and an output 272 (or output interface 272). The mode selection unit 260 may include an inter prediction unit 244, an intra prediction unit 254, and a partition unit 262. The inter prediction unit 244 may include a motion estimation unit and a motion compensation unit (not shown). The video encoder 20 shown in fig. 2 may also be referred to as a hybrid video encoder or a hybrid video codec-based video encoder.
The residual calculation unit 204, the transform processing unit 206, the quantization unit 208, and the mode selection unit 260 may constitute a forward signal path of the encoder 20, and the inverse quantization unit 210, the inverse transform processing unit 212, the reconstruction unit 214, the buffer 216, the loop filter 220, the Decoded Picture Buffer (DPB) 230, the inter prediction unit 244, and the intra prediction unit 254 may constitute a backward signal path of the video encoder 20, wherein the backward signal path of the video encoder 20 corresponds to a signal path of a decoder (see the video decoder 30 shown in fig. 3). The inverse quantization unit 210, the inverse transform processing unit 212, the reconstruction unit 214, the loop filter 220, the Decoded Picture Buffer (DPB) 230, the inter prediction unit 244, and the intra prediction unit 254 also constitute a "built-in decoder" of the video encoder 20.
Image and image segmentation (image and block)
For example, the encoder 20 may be configured to receive the image 17 (or the image data 17) via the input 201, e.g. to form a video or an image of a sequence of images of a video sequence. The received image or image data may also be a pre-processed image 19 (or pre-processed image data 19). For simplicity, all are described below as image 17. Image 17 may also be referred to as a current image or an image to be coded (particularly in video coding in order to distinguish the current image from other images (e.g., previously encoded images and/or decoded images) in the same video sequence (i.e., a video sequence that includes the current image at the same time)).
The (digital) image is or can be considered as a two-dimensional array or matrix of samples having intensity values. The samples in the array may also be referred to as pixels (short for picture elements). The number of samples in the horizontal and vertical directions (or axes) of the array or image defines the size and/or resolution of the image. To represent color, three color components are typically employed, i.e., an image may be represented as or may include three arrays of samples. In an RGB format or color space, an image includes corresponding arrays of red, green, and blue samples. However, in video coding, each pixel is typically represented in luminance and chrominance format or color space, such as YCbCr, comprising one luminance component (sometimes also denoted L) represented by Y and two chrominance components represented by Cb and Cr. The luminance (luma) component Y represents the luminance or gray-scale intensity (e.g., as in a gray-scale image), while the two chrominance (chroma) components Cb and Cr represent the chrominance or color information components. Accordingly, an image in YCbCr format includes one luminance sample array composed of luminance sample values (Y) and two chrominance sample arrays composed of chrominance values (Cb and Cr). An image in RGB format may be converted or transformed into YCbCr format and vice versa, a process also known as color conversion or color transformation. If the image is black and white, the image may include only an array of intensity samples. Accordingly, for example, an image may be one array of luma samples in black and white format or one array of luma samples and two corresponding arrays of chroma samples in 4:2:0, 4:2:2, and 4:4:4 color formats.
In some embodiments, video encoder 20 may include an image segmentation unit (not shown in fig. 2) for segmenting image 17 into a plurality of (typically non-overlapping) image blocks 203. These blocks may also be referred to as root blocks, macroblocks (in the h.264/AVC standard), or Coding Tree Blocks (CTBs) or Coding Tree Units (CTUs) (in the h.265/HEVC and VVC standards). The image segmentation unit may be adapted to use the same block size for all images in the video sequence and to use a corresponding grid defining the block sizes, or to vary the block sizes between images or subsets or groups of images and to divide each image into a plurality of corresponding blocks.
In other embodiments, the video encoder may be configured to receive the blocks 203 of the image 17 directly, e.g., one, several, or all of the blocks that make up the image 17. The image block 203 may also be referred to as a current image block or an image block to be coded.
Similar to image 17, image patch 203 is also or can be considered a two-dimensional array or matrix of pixels having intensity values (sample values), but the size of image patch 203 is smaller than the size of image 17. In other words, depending on the color format applied, the block 203 may comprise, for example, one sample array (e.g., one luma array when the image 17 is a black and white image or one luma array or one chroma array when the image 17 is a color image) or three sample arrays (e.g., one luma array and two chroma arrays when the image 17 is a color image) or any other number and/or type of arrays. The number of samples in the horizontal and vertical directions (or axes) of the front block 203 defines the size of the block 203. Accordingly, one block may be an M × N (M columns × N rows) array of samples or an M × N array of transform coefficients, or the like.
In some embodiments, the video encoder 20 shown in fig. 2 may be used to encode the image 17 block-by-block, e.g., to encode and predict each block 203.
In some embodiments, the video encoder 20 shown in fig. 2 may also be used to segment and/or encode a picture using slices (also referred to as video slices), where a picture may be segmented or encoded using one or more slices (typically non-overlapping slices), and each slice may include one or more blocks (e.g., CTUs).
In some embodiments, the video encoder 20 shown in fig. 2 may also be configured to partition and/or encode a picture using slices/groups of slices (also referred to as video blocks) and/or partitions (also referred to as video partitions), where the picture may be partitioned or encoded using one or more groups of slices (typically non-overlapping groups of slices), each group of slices may include one or more blocks (e.g., CTUs) or one or more partitions, etc., each slice may be, for example, rectangular, and may include one or more blocks (e.g., CTUs), e.g., complete or partial blocks.
Residual calculation
The residual calculation unit 204 may be configured to calculate a residual block 205 (also referred to as a residual 205) from the image block 203 and a prediction block 265 (the prediction block 265 will be described in detail below) by, for example, subtracting sample values of the prediction block 265 from sample values of the image block 203 sample by sample (pixel by pixel) resulting in the residual block 205 in the sample domain.
Transformation of
The transform processing unit 206 may be configured to perform a Discrete Cosine Transform (DCT) or a Discrete Sine Transform (DST) on sample values of the residual block 205 to obtain transform coefficients 207 in a transform domain. The transform coefficients 207, which may also be referred to as transform residual coefficients, represent a residual block 205 in the transform domain.
The transform processing unit 206 may be used to apply integer approximations of DCT/DST, such as the transforms specified in h.265/HEVC. This integer approximation is typically scaled by a factor (scale) compared to the orthogonal DCT transform. To maintain the norm of the residual block processed by the forward and inverse transforms, other scaling factors are applied as part of the transform process. The scaling factor is typically selected according to certain constraints, e.g., the scaling factor is a power of 2 for a shift operation, the bit depth of the transform coefficients, a trade-off between precision and implementation cost, etc. For example, a specific scaling factor is specified for the inverse transform by the inverse transform processing unit 212 or the like (and for the corresponding inverse transform by the inverse transform processing unit 312 or the like on the video decoder 30 side); accordingly, the corresponding scaling factor may be specified for the forward transform at the encoder 20 side by the transform processing unit 206 or the like.
In some embodiments, video encoder 20 (accordingly, transform processing unit 206) may be configured to output one or more transformed transform parameters, e.g., directly or after encoding or compression by entropy encoding unit 270, such that video decoder 30 may receive and use these transform parameters for decoding.
Quantization
Quantization unit 208 may be configured to quantize transform coefficients 207 by applying scalar quantization or vector quantization, etc., resulting in quantized coefficients 209. Quantized coefficients 209 may also be referred to as quantized transform coefficients 209 or quantized residual coefficients 209.
The quantization process may reduce the bit depth associated with some or all of transform coefficients 207. For example, an n-bit transform coefficient may be rounded down to an m-bit transform coefficient during quantization, where n is greater than m, and the quantization level may be modified by adjusting a Quantization Parameter (QP). For example, for scalar quantization, different scaling may be performed to achieve finer or coarser quantization. The smaller the quantization step size is, the finer the corresponding quantization is; and the larger the quantization step size, the coarser the corresponding quantization. An appropriate quantization step size can be represented by a Quantization Parameter (QP). For example, the quantization parameter may be an index of a predefined set of suitable quantization steps. For example, a smaller quantization parameter may correspond to a finer quantization (smaller quantization step size) and a larger quantization parameter may correspond to a coarser quantization (larger quantization step size), or vice versa. The quantization may comprise a division by a quantization step size and a corresponding inverse quantization or dequantization, e.g. performed by the inverse quantization unit 210, or may comprise a multiplication by a quantization step size. In some embodiments, the quantization step size may be determined using a quantization parameter according to some standards, such as HEVC. In general, the quantization step size may be calculated from the quantization parameter using a fixed point approximation of an equation including division. Additional scaling factors may be introduced for quantization and dequantization to recover the norm of the residual block that may have been modified due to the use of scaling in the fixed-point approximation of the equation for the quantization step size and quantization parameter. In one exemplary implementation, the scaling of the inverse transform and dequantization may be combined. Alternatively, a custom quantization table may be used and indicated (signal) to the decoder by the encoder via a code stream or the like. Quantization is a lossy operation, where the larger the quantization step, the greater the loss.
In some embodiments, video encoder 20 (accordingly, quantization unit 208) may be configured to output Quantization Parameters (QPs), e.g., directly or after encoding by entropy encoding unit 270, such that video decoder 30 may receive and decode using the quantization parameters.
Inverse quantization
The inverse quantization unit 210 is configured to perform inverse quantization of the quantization performed by the quantization unit 208 on the quantized coefficients, resulting in dequantized coefficients 211, e.g., an inverse quantization scheme of the quantization scheme performed by the quantization unit 208 according to or using the same quantization step as the quantization unit 208. Dequantized coefficients 211 may also be referred to as dequantized residual coefficients 211, corresponding to transform coefficients 207, but dequantized coefficients 211 are typically not exactly identical to transform coefficients due to losses caused by quantization.
Inverse transformation
The inverse transform processing unit 212 is configured to perform an inverse transform of the transform performed by the transform processing unit 206, such as an inverse Discrete Cosine Transform (DCT), an inverse Discrete Sine Transform (DST), or other inverse transform, to obtain a reconstructed residual block 213 (or corresponding dequantized coefficients 211) in the sample domain. The reconstructed residual block 213 may also be referred to as a transform block 213.
Reconstruction
The reconstruction unit 214 (e.g., adder or summer 214) is configured to add the transform block 213 (i.e., the reconstructed residual block 213) to the prediction block 265 by adding sample-by-sample the sample values of the reconstructed residual block 213 and the sample values of the prediction block 265 to obtain a reconstructed block 215 in the sample domain.
Filtering
Loop filter unit 220 (or simply "loop filter" 220) is used to filter reconstructed block 215 resulting in filtered block 221, or is typically used to filter reconstructed samples resulting in filtered sample values. For example, loop filter units are used to smooth pixel transitions or improve video quality. Loop filter unit 220 may include one or more loop filters, such as a deblocking filter, a sample-adaptive offset (SAO) filter, or one or more other filters, such as a bilateral filter, an Adaptive Loop Filter (ALF), a sharpening or smoothing filter, a collaborative filter, or any combination thereof. Although loop filter unit 220 is shown in fig. 2 as an in-loop filter, in other configurations, loop filter unit 220 may be implemented as a post-loop filter. The filtered block 221 may also be referred to as a filtered reconstruction block 221.
In some embodiments, video encoder 20 (correspondingly, loop filter unit 220) may be configured to output, e.g., directly or after encoding by entropy encoding unit 270, loop filter parameters (e.g., sample adaptive offset information) such that decoder 30 may receive and decode using the same or different loop filter parameters.
Decoded picture buffer
Decoded Picture Buffer (DPB) 230 may be a memory that stores reference pictures or, in general, reference picture data for use when video encoder 20 encodes video data. DPB 230 may be comprised of any of a variety of memory devices, such as Dynamic Random Access Memory (DRAM), including Synchronous DRAM (SDRAM), Magnetoresistive RAM (MRAM), Resistive RAM (RRAM), or other types of memory devices. A Decoded Picture Buffer (DPB) 230 may be used to store the one or more filtered blocks 221. The decoded picture buffer 230 may also be used to store other previously filtered blocks (e.g., previously reconstructed and filtered blocks 221) in the same current picture or a different picture (e.g., a previously reconstructed picture), and may provide the complete previously reconstructed (i.e., decoded) picture (and corresponding reference blocks and samples) and/or a portion of the reconstructed current picture (and corresponding reference blocks and samples) for inter prediction, etc. The Decoded Picture Buffer (DPB) 230 may also be used to store one or more unfiltered reconstructed blocks 215, or, for example, generally unfiltered reconstructed samples if the reconstructed blocks 215 are not filtered by the loop filter unit 220, or reconstructed blocks or reconstructed samples that have not been subjected to any other processing.
Mode selection (segmentation and prediction)
Mode select unit 260 includes a partition unit 262, an inter-prediction unit 244, and an intra-prediction unit 254 to receive or obtain raw image data, such as raw block 203 (current block 203 of current image 17), and reconstructed image data, such as filtered and/or unfiltered reconstructed samples or reconstructed blocks of the same (current) image and/or one or more previously decoded images, from decoded image buffer 230 or other buffers (e.g., line buffers, not shown). The reconstructed image data is used as reference image data in prediction such as inter prediction or intra prediction to obtain a prediction block 265 or a prediction value 265.
The mode selection unit 260 may be configured to determine or select a partition and a prediction mode (e.g., intra-prediction mode or inter-prediction mode) for the current block prediction mode (including non-partitioned cases), and generate a corresponding prediction block 265 for calculating the residual block 205 and reconstructing the reconstructed block 215.
In some embodiments, mode selection unit 260 may be used to select the partitioning and prediction modes (e.g., select from those modes supported or available by mode selection unit 260). The partitioning and prediction modes provide the best match or minimum residual (minimum residual means better compression for transmission or storage), or provide minimum signaling overhead (minimum signaling overhead means better compression for transmission or storage), or both. The mode selection unit 260 may be configured to determine the segmentation and prediction modes according to Rate Distortion Optimization (RDO), i.e. to select the prediction mode providing the smallest rate distortion. Herein, the terms "best," "minimum," "optimal," and the like do not necessarily refer to "best," "minimum," "optimal," and the like as a whole, but may also refer to situations where termination or selection criteria are met, e.g., where a certain value exceeds or falls below a threshold or other limit, which may result in "suboptimal selection," but which may reduce complexity and processing time.
In other words, the partitioning unit 262 may be used to partition the block 203 into smaller block partitions (partitions) or sub-blocks (which again form blocks), for example, by iteratively using quad-tree partitioning (QT), binary-tree partitioning (BT), or ternary-tree partitioning (TT), or any combination thereof, and to predict each of the partitions or sub-blocks, for example, wherein the mode selection includes selecting the tree structure of the partition 203 and selecting the prediction mode applied to each partition or sub-block.
The partitioning (e.g., performed by partitioning unit 260) and prediction processing (performed by inter-prediction unit 244 and intra-prediction unit 254) performed by exemplary video encoder 20 will be described in detail below.
Segmentation
The segmentation unit 262 may segment (or divide) the current block 203 into smaller segmented blocks, e.g., smaller blocks of square or rectangular size. These smaller blocks (which may also be referred to as sub-blocks) may be further partitioned into even smaller partitioned blocks. This is also referred to as tree splitting or hierarchical tree splitting, where, for example, a root block at root tree level 0 (hierarchical level 0, depth 0) may be recursively split into two or more next lower level blocks, for example, into nodes at tree level 1 (hierarchical level 1, depth 1). These blocks may again be split into two or more next lower level blocks, e.g., tree level 2 (hierarchical level 2, depth 2) blocks, until the split ends (because an end criterion is reached, e.g., maximum tree depth or minimum block size is reached). The blocks that are not further partitioned are also referred to as leaf blocks or leaf nodes of the tree. A tree divided into two blocks is called a binary-tree (BT), a tree divided into three blocks is called a ternary-tree (TT), and a tree divided into four blocks is called a Quadtree (QT).
As mentioned above, the term "block" as used herein may be a portion of an image, in particular a square or rectangular portion. For example, with reference to HEVC and VVC, a block may be or may correspond to a Coding Tree Unit (CTU), a Coding Unit (CU), a Prediction Unit (PU), and a Transform Unit (TU), and/or to a plurality of corresponding blocks, such as a Coding Tree Block (CTB), a Coding Block (CB), a Transform Block (TB), or a Prediction Block (PB).
For example, a Coding Tree Unit (CTU) may be or may include one CTB of luma samples and two corresponding CTBs of chroma samples in a picture with three sample arrays, or may be or may include one CTB of samples in a black and white picture or a picture coded using three separate color planes and syntax structures for coding the samples. Accordingly, a Coding Tree Block (CTB) may be a block of N × N samples, where N may be set to a certain value such that one component is divided into a plurality of CTBs, which is a division manner. A Coding Unit (CU) may be or may comprise one coded block of luma samples and two corresponding coded blocks of chroma samples in a picture with three arrays of samples or may be or may comprise one coded block of samples in a black and white picture or a picture coded using three separate color planes and syntax structures for coding the samples. Accordingly, a Coding Block (CB) may be an M × N sample block, where M and N may be set to a certain value, so that one CTB is divided into a plurality of coding blocks, which is a division manner.
In some embodiments, e.g., according to HEVC, a Coding Tree Unit (CTU) may be partitioned into CUs by a quadtree structure represented as a coding tree. Whether an image region is coded by inter (temporal) prediction or intra (spatial) prediction is decided at the CU level. Each CU may be further divided into one, two, or four PUs according to the PU division type. The same prediction process will be performed within one PU and the relevant information is sent to the decoder in units of PU. After performing the prediction process according to the PU partition type to obtain a residual block, the CU may be partitioned into Transform Units (TUs) according to other quadtree structures similar to the coding tree of the CU.
In some embodiments, for example, the coded blocks are partitioned using combined Quad-tree and binary tree (QTBT) partitioning according to the latest Video Coding standard currently being developed, known as universal Video Coding (VVC). In the QTBT block structure, a CU may be square or rectangular. For example, a Coding Tree Unit (CTU) is first divided by a quadtree structure. The quadtree leaf nodes are then further partitioned by a binary or ternary tree (ternary) structure. The segmentation tree leaf nodes are called Coding Units (CUs), and this segmentation is used for prediction and transform processing without any further segmentation. That is, in the QTBT coding block structure, the block sizes of CU, PU, and TU are the same. Meanwhile, multiple partitions such as ternary tree partitions can be used with the QTBT block structure.
In one example, mode selection unit 260 in video encoder 20 may be used to perform any combination of the segmentation techniques described herein.
As described above, video encoder 20 is used to determine or select the best or optimal prediction mode from a (e.g., predetermined) set of prediction modes. For example, the prediction mode set may include an intra prediction mode and/or an inter prediction mode, and the like.
Intra prediction
The set of intra prediction modes may include 35 different intra prediction modes, e.g., non-directional modes such as DC (or mean) mode and planar mode, or directional modes as defined in HEVC, or may include 67 different intra prediction modes, e.g., non-directional modes such as DC (or mean) mode and planar mode, or directional modes as defined in VVC.
The intra prediction unit 254 is configured to generate the intra prediction block 265 according to an intra prediction mode in the intra prediction mode set by using reconstructed samples of neighboring blocks in the same current picture.
Intra-prediction unit 254 (or generally mode selection unit 260) is also used to output intra-prediction parameters (or generally information representative of the selected intra-prediction mode for the block) to entropy encoding unit 270 in the form of syntax elements 266 for inclusion into encoded image data 21 so that video decoder 30 may, for example, receive and use the prediction parameters for decoding.
Inter-frame prediction
The set of (possible) inter prediction modes depends on the available reference pictures (i.e. at least part of the decoded pictures stored in the DPB 230, e.g. as described above) and other inter prediction parameters, e.g. on whether the entire reference picture or only a part of the reference picture (e.g. the search window area around the area of the current block) is used to search for the best matching reference block, and/or e.g. on whether a pixel interpolation is performed, e.g. a half-pixel and/or quarter-pixel interpolation.
In addition to the prediction mode described above, a skip mode and/or a direct mode may be used.
The inter prediction unit 244 may include a Motion Estimation (ME) unit and a Motion Compensation (MC) unit (both not shown in fig. 2). The motion estimation unit may be configured to receive or obtain an image block 203 (a current image block 203 of the current image 17) and the decoded image 231, or at least one or more previously reconstructed blocks (e.g., reconstructed blocks of one or more other/different previously decoded images 231) for motion estimation. For example, the video sequence may comprise a current picture and a previously decoded picture 231, in other words, the current picture and the previously decoded picture 231 may be part of or constitute a series of pictures, which constitute the video sequence.
For example, the encoder 20 may be configured to select a reference block from a plurality of reference blocks of the same or different one of a plurality of other images and provide the reference image (or reference image index) and/or an offset (spatial offset) between the position (x-coordinate, y-coordinate) of the reference block and the position of the current block as an inter prediction parameter to the motion estimation unit. This offset is also called a Motion Vector (MV).
The motion compensation unit is configured to obtain (e.g., receive) inter-prediction parameters, and perform inter-prediction according to or using the inter-prediction parameters to obtain an inter-prediction block 265. The motion compensation performed by the motion compensation unit may include acquiring or generating a prediction block from a motion/block vector determined through motion estimation, and may further include performing interpolation on sub-pixel precision. Interpolation filtering may generate other pixel samples from known pixel samples, thereby potentially increasing the number of candidate prediction blocks that may be used to code an image block. After receiving the motion vector corresponding to the PU of the current image block, the motion compensation unit may locate the prediction block pointed to by the motion vector in one of the reference picture lists.
Motion compensation unit may also generate syntax elements related to the block and the video slice for use by video decoder 30 in decoding an image block of the video slice. The group of partitions and/or the partitions and the corresponding syntax elements may be generated or used in addition to or instead of the stripes and the corresponding syntax elements.
Entropy coding
Entropy encoding unit 270 is used to apply or not apply (non-compressed) quantization coefficients 209, inter-frame prediction parameters, intra-frame prediction parameters, loop filter parameters, and/or other syntax elements to entropy encoding algorithms or schemes (e.g., Variable Length Coding (VLC) schemes, Context Adaptive VLC (CAVLC) schemes, arithmetic coding schemes, binarization, Context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), Probability Interval Partitioning Entropy (PIPE) coding, or other entropy encoding methods or techniques), to obtain encoded image data 21 that may be output via output 272 in the form of an encoded codestream 21 or the like, such that video decoder 30 may receive and use these parameters for decoding or the like. The encoded codestream 21 may be transmitted to the video decoder 30 or stored in memory for subsequent transmission or retrieval by the video decoder 30.
Other structural variations of video encoder 20 may be used to encode the video stream. For example, the non-transform based encoder 20 may directly quantize the residual signal of certain blocks or frames without the transform processing unit 206. In another implementation, the encoder 20 may include the quantization unit 208 and the inverse quantization unit 210 combined into a single unit.
Decoder and decoding method
Fig. 3 shows an example of a video decoder 30 for implementing the techniques of the present application. The video decoder 30 is configured to receive encoded image data 21 (e.g., encoded codestream 21), e.g., encoded by the encoder 20, resulting in a decoded image 331. The encoded image data or codestream includes information for decoding the encoded image data, such as data representing image blocks of the encoded video slice (and/or groups or partitions) and associated syntax elements.
In the example of fig. 3, the decoder 30 includes an entropy decoding unit 304, an inverse quantization unit 310, an inverse transform processing unit 312, a reconstruction unit 314 (e.g., a summer 314), a loop filter 320, a decoded picture buffer (DBP) 330, a mode application unit 360, an inter prediction unit 344, and an intra prediction unit 354. The inter prediction unit 344 may be or may include a motion compensation unit. In some examples, video decoder 30 may perform a decoding process that is substantially reciprocal to the encoding process described with reference to video encoder 20 in fig. 2.
The inverse quantization unit 210, the inverse transform processing unit 212, the reconstruction unit 214, the loop filter 220, the Decoded Picture Buffer (DPB) 230, the inter prediction unit 344, and the intra prediction unit 354 also constitute a "built-in decoder" of the video encoder 20, as described with reference to the encoder 20. Accordingly, the inverse quantization unit 310 may have the same function as the inverse quantization unit 210, the inverse transform processing unit 312 may have the same function as the inverse transform processing unit 212, the reconstruction unit 314 may have the same function as the reconstruction unit 214, the loop filter 320 may have the same function as the loop filter 220, and the decoded picture buffer 330 may have the same function as the decoded picture buffer 230. Accordingly, the explanations of the corresponding units and functions of the video encoder 20 apply correspondingly to the corresponding units and functions of the video decoder 30.
Entropy decoding
The entropy decoding unit 304 is configured to parse the code stream 21 (or generally referred to as encoded image data 21), perform entropy decoding on the encoded image data 21, and the like, to obtain quantized coefficients 309 and/or decoded encoding parameters (not shown in fig. 3), such as any or all of inter-prediction parameters (e.g., reference image indexes and motion vectors), intra-prediction parameters (e.g., intra-prediction modes or indexes), transform parameters, quantization parameters, loop filter parameters, and/or other syntax elements. Entropy decoding unit 304 may be used to apply a decoding algorithm or scheme corresponding to the encoding scheme described with reference to entropy encoding unit 270 in encoder 20. Entropy decoding unit 304 may also be used to provide inter-prediction parameters, intra-prediction parameters, and/or other syntax elements to mode application unit 360, as well as other parameters to other units of decoder 30. Video decoder 30 may receive syntax elements at the video slice level and/or the video block level. In addition to or instead of a slice and a respective syntax element, a group of partitions and/or a partition and a respective syntax element may be received and/or used.
Inverse quantization
Inverse quantization unit 310 may be configured to receive Quantization Parameters (QPs) (or information generally referred to as inverse quantization related information) and quantization coefficients from encoded image data 21 (e.g., parsed and/or decoded by entropy decoding unit 304, etc.), and inverse quantize decoded quantization coefficients 309 according to the quantization parameters to obtain dequantized coefficients 311, where dequantized coefficients 311 may also be referred to as transform coefficients 311. The inverse quantization process may include: the quantization parameter determined by video encoder 20 for each video block in a video slice (or block or group of blocks) is used to determine the degree of quantization, as well as the degree of inverse quantization that needs to be applied.
Inverse transformation
The inverse transform processing unit 312 may be configured to receive the dequantized coefficients 311 (also referred to as transform coefficients 311) and transform the dequantized coefficients 311 to obtain a reconstructed residual block 313 in the sample domain. The reconstructed residual block 313 may also be referred to as a transform block 313. The transform may be an inverse transform, such as an inverse DCT, an inverse DST, an inverse integer transform, or a conceptually similar inverse transform process. Inverse transform processing unit 312 may also be used to receive transform parameters or corresponding information from encoded image data 21 (e.g., parsed and/or decoded by entropy decoding unit 304, etc.) to determine the transform to be performed on dequantized coefficients 311.
Reconstruction
The reconstruction unit 314 (e.g., adder or summer 314) may be used to add sample values of the reconstructed residual block 313 to sample values of the prediction block 365, e.g., by adding the reconstructed residual block 313 to the prediction block 365 to result in a reconstructed block 315 in the sample domain.
Filtering
Loop filter unit 320 (in or after the coding loop) may be used to filter reconstruction block 315, resulting in filtered block 321, e.g., to smooth pixel transitions or otherwise improve video quality, etc. Loop filter unit 320 may include one or more loop filters, such as a deblocking filter, a sample-adaptive offset (SAO) filter, or one or more other filters, such as a bilateral filter, an Adaptive Loop Filter (ALF), a sharpening or smoothing filter, a collaborative filter, or any combination thereof. Although loop filter unit 320 is shown in fig. 3 as an in-loop filter, in other configurations, loop filter unit 320 may be implemented as a post-loop filter.
Decoded picture buffer
Decoded video block 321 of the picture is then stored in decoded picture buffer 330, where decoded picture buffer 330 stores decoded picture 331 as a reference picture for subsequent motion compensation and/or output and display of other pictures.
The decoder 30 is used to output the decoded image 311, via an output 312 or the like, for presentation to or viewing by a user.
Prediction
The inter prediction unit 344 may function as the inter prediction unit 244 (particularly, a motion compensation unit), and the intra prediction unit 354 may function as the intra prediction unit 254, and perform a partitioning or partitioning decision and perform prediction according to a partitioning manner and/or prediction parameters or corresponding information received from the encoded image data 21 (e.g., parsed and/or decoded by the entropy decoding unit 304, etc.). The mode application unit 360 may be configured to perform prediction (intra prediction or inter prediction) on each block according to the reconstructed image, block or corresponding samples (filtered or unfiltered), resulting in a prediction block 365.
When the video slice is coded as an intra-coded (I) slice, intra-prediction unit 354 in mode application unit 360 is used to generate prediction block 365 for an image block of the current video slice according to an intra-prediction mode indicated (signal) and data from previously decoded blocks of the current image. When the video image is coded as an inter-coded (e.g., B or P) slice, inter prediction unit 344 (e.g., a motion compensation unit) of mode application unit 360 is used to generate prediction block 365 for the video blocks of the current video slice according to the motion vectors and other syntax elements received from entropy decoding unit 304. For inter prediction, the prediction blocks may be generated from one of the reference pictures in one of the reference picture lists. Video decoder 30 may construct reference frame list0 and list1 using preset construction techniques based on the reference pictures stored in DPB 330. In addition to or instead of stripes (e.g., video stripes), the same or similar processes may be applied to or by embodiments that use groups of partitions (e.g., video groups of partitions) and/or chunks (e.g., video chunks), e.g., video may be coded using I, P or B groups of partitions and/or chunks.
The mode application unit 360 is configured to determine prediction information for a video block of a current video slice by parsing motion vectors or related information and other syntax elements, and generate a prediction block for the current video block being decoded using the prediction information. For example, the mode application unit 360 uses some of the received syntax elements to determine a prediction mode (e.g., intra prediction or inter prediction) for coding the video blocks of the video slice, an inter prediction slice type (e.g., B-slice, P-slice, or GPB-slice), construction information for one or more reference picture lists for the slice, a motion vector for each inter-coded video block of the slice, an inter prediction state for each inter-coded video block of the slice, and other information to decode the video blocks in the current video slice. In addition to or instead of stripes (e.g., video stripes), the same or similar process may be applied to or by embodiments that use groups of partitions (e.g., groups of video partitions) and/or partitions (e.g., video partitions), e.g., video may be coded using I, P or B groups of partitions and/or partitions.
In some embodiments, the video decoder 30 shown in fig. 3 may be used to partition and/or decode a picture using slices (also referred to as video slices), where a picture may be partitioned or decoded using one or more slices (typically non-overlapping slices), and each slice may include one or more blocks (e.g., CTUs).
In some embodiments, the video decoder 30 shown in fig. 3 may be configured to partition and/or decode a picture using a group of blocks (also referred to as a video block group) and/or a block (also referred to as a video block), where the picture may be partitioned or decoded using one or more groups of blocks (typically non-overlapping groups of blocks), each group of blocks may include one or more blocks (e.g., CTUs) or one or more blocks, etc., each block may be, for example, rectangular, and may include one or more blocks (e.g., CTUs), e.g., whole or partial blocks.
Other variations of video decoder 30 may be used to decode encoded image data 21. For example, the decoder 30 can generate an output video stream without the loop filter unit 320. For example, non-transform-based decoder 30 may directly inverse quantize the residual signal for certain blocks or frames without inverse transform processing unit 312. In another implementation, video decoder 30 may include inverse quantization unit 310 and inverse transform processing unit 312 combined into a single unit.
It should be understood that in the encoder 20 and the decoder 30, the processing result of the current step may be further processed and then output to the next step. For example, after interpolation filtering, motion vector derivation, or loop filtering, the processing result of interpolation filtering, motion vector derivation, or loop filtering may be further operated, such as performing a clip (clip) or shift (shift) operation.
It should be noted that the derived motion vector of the current block (including but not limited to the control point motion vector in affine mode, sub-block motion vector in affine mode, planar mode, ATMVP mode, temporal motion vector, etc.) may be further operated. For example, the value of a motion vector is limited to a predefined range according to its representation bits. If the representation bit of the motion vector is bitDepth, the value of the motion vector ranges from-2 ^ (bitDepth-1) to 2^ (bitDepth-1) -1, wherein the symbol of ^ represents the power. For example, if bitDepth is set equal to 16, the range is-32768 ~ 32767; if bitDepth is set equal to 18, the range is-131072-131071. For example, the values of the derived motion vectors (e.g., the MVs of four 4 × 4 sub-blocks in an 8 × 8 block) are restricted such that the maximum difference between the integer parts of the MVs of the four 4 × 4 sub-blocks does not exceed N pixels, e.g., 1 pixel. Two methods of limiting the motion vector according to bitDepth are provided herein.
The method comprises the following steps: deleting the overflowing Most Significant Bit (MSB) by smoothing operation
ux=(mvx+2 bitDepth )%2 bitDepth (1)
mvx=(ux>=2 bitDepth-1 )?(ux-2 bitDepth ):ux (2)
uy=(mvy+2 bitDepth )%2 bitDepth (3)
mvy=(uy>=2 bitDepth-1 )?(uy-2 bitDepth ):uy (4)
Wherein mvx represents the horizontal component of the motion vector of an image block or sub-block; mvy represents the vertical component of the motion vector of an image block or sub-block; ux and uy represent intermediate values.
For example, if the value of mvx is-32769, the value obtained after using equations (1) and (2) is 32767. In a computer system, decimal numbers are stored in two's complement. The two's complement of-32769 is 1,0111,1111,1111,1111(17 bits), with the MSB discarded, resulting in a two's complement of 0111,1111,1111,1111 (32767 decimal), which is the same as the output obtained after using equations (1) and (2).
ux=(mvpx+mvdx+2 bitDepth )%2 bitDepth (5)
mvx=(ux>=2 bitDepth-1 )?(ux-2 bitDepth ):ux (6)
uy=(mvpy+mvdy+2 bitDepth )%2 bitDepth (7)
mvy=(uy>=2 bitDepth-1 )?(uy-2 bitDepth ):uy (8)
These operations may be performed in the summation of mvp and mvd as shown in equations (5) to (8).
The method 2 comprises the following steps: clipping values to remove overflowing MSB
vx=Clip3(-2 bitDepth-1 ,2 bitDepth-1 -1,vx)
vy=Clip3(-2 bitDepth-1 ,2 bitDepth-1 -1,vy)
Wherein vx represents the horizontal component of the motion vector of the image block or sub-block; vy denotes the vertical component of the motion vector of the image block or sub-block; x, y and z correspond to three input values of the MV clipping process, respectively; the function Clip3 is defined as follows:
Figure BDA0003739688710000191
fig. 4 is a schematic diagram of a video coding apparatus 400 according to an embodiment of the present invention. The video coding device 400 is suitable for implementing the disclosed embodiments described herein. In one embodiment, the video coding device 400 may be a decoder (e.g., the video decoder 30 of fig. 1A) or an encoder (e.g., the video encoder 20 of fig. 1A).
The video coding apparatus 400 includes: an ingress port 410 (or input port 410) for receiving data and a receiving unit (Rx) 420; a processor, logic unit, or Central Processing Unit (CPU) 430 for processing data; a transmission unit (Tx)440 and an egress port 450 (or an egress port 450) for transmitting data; a memory 460 for storing data. The video decoding apparatus 400 may further include an optical-to-electrical (OE) component and an electrical-to-optical (EO) component coupled to the ingress port 410, the reception unit 420, the transmission unit 440, and the egress port 450, serving as an egress or ingress of optical or electrical signals.
The processor 430 may be implemented by hardware and software. Processor 430 may be implemented as one or more CPU chips, one or more cores (e.g., a multi-core processor), one or more FPGAs, one or more ASICs, and one or more DSPs. Processor 430 is in communication with ingress port 410, receiving unit 420, transmitting unit 440, egress port 450, and memory 460. Processor 430 includes a decode module 470. Coding module 470 implements the disclosed embodiments described above. For example, decode module 470 performs, processes, prepares, or provides various decode operations. Thus, the inclusion of coding module 470 may substantially improve the functionality of video coding apparatus 400 and affect the transition of video coding apparatus 400 to different states. Optionally, the decode module 470 is implemented as instructions stored in the memory 460 and executed by the processor 430.
Memory 460 may include one or more disks, one or more tape drives, and one or more solid state drives, and may serve as an over-flow data storage device to store programs when they are selected for execution and to store instructions and data that are read during execution of the programs. For example, the memory 460 may be volatile and/or non-volatile, and may be read-only memory (ROM), Random Access Memory (RAM), ternary content-addressable memory (TCAM), and/or Static Random Access Memory (SRAM).
Fig. 5 is a simplified block diagram of an apparatus 500 provided by an exemplary embodiment, wherein the apparatus 500 may be used as either or both of the source device 12 and the destination device 14 in fig. 1.
The processor 502 in the apparatus 500 may be a central processing unit. Alternatively, processor 502 may be any other type of device or devices now known or later developed that is capable of operating or processing information. Although the disclosed implementations may be implemented using a single processor, such as processor 502 as shown, speed and efficiency may be improved by using multiple processors.
In one implementation, the memory 504 in the apparatus 500 may be a Read Only Memory (ROM) device or a Random Access Memory (RAM) device. Any other suitable type of storage device may be used for memory 504. The memory 504 may include code and data 506 that the processor 502 accesses over the bus 512. The memory 504 may also include an operating system 508 and application programs 510, the application programs 510 including at least one program that causes the processor 502 to perform the methods described herein. For example, applications 510 may include applications 1 through N, and may also include a video coding application that performs the methods described herein.
The apparatus 500 may also include one or more output devices, such as a display 518. In one example, display 518 may be a touch-sensitive display that combines a display with a touch-sensitive element that can be used to sense touch input. A display 518 may be coupled to the processor 502 via the bus 512.
Although the bus 512 in the apparatus 500 is described herein as a single bus, the bus 512 may include multiple buses. Further, the secondary memory 514 may be directly coupled to other components in the apparatus 500 or may be accessible over a network and may comprise a single integrated unit, e.g., one memory card, or multiple units, e.g., multiple memory cards. Thus, the apparatus 500 may be implemented in a variety of configurations.
As described in the j.m. boyce paper "Weighted prediction in the h.264/MPEG AVC video coding standard" (IEEE international circuit and systems seminar, 2004, month 5, canada, page 789-. The Weighted Prediction (WP) tool is used in the main profile and the extension profile of the h.264 video coding standard, and the weighted prediction is performed by applying the multiplicative weighting factors and the additive offset to the motion compensation prediction, thereby improving the decoding efficiency. In the explicit mode, the weighting factors and offsets may be coded into the slice header corresponding to each allowable reference picture index. In implicit mode, the weighting factors are not coded, but are derived from the relative Picture Order Count (POC) distance of the two reference pictures. The experimental results provided show how much the decoding efficiency is improved when WP is used. When decoding a fade-to-black (fade-to-black) sequence, the code rate is reduced by up to 67%.
When WP is applied for uni-directional prediction, as in P pictures, WP is similar to the leaky prediction (leaky prediction) previously proposed for error resilience. Leaky prediction is a special case of WP, where the scaling factor is limited to a range of 0 ≦ α ≦ 1. WP in h.264 can use not only negative scaling factors but also scaling factors larger than 1. The weighting factors are applied pixel-by-pixel through the coded tag field to efficiently compress the covered and uncovered areas. One key difference between the WP tool in h.264 and previous proposals involving weighted prediction for improving compression efficiency is that: the association between reference picture indices and weighting factor parameters allows for efficient indication of these parameters in a multiple reference picture environment. As described in the paper "accurate parameter estimation and efficient fade detection for weighted prediction in h.264 video compression" by r.zhang and g.code (IEEE International Conference on Image Processing, month 10 2008, san diego, ca, usa) the flow of applying WP in a real-time encoding system can be formatted as a series of steps shown in fig. 6. First, some statistics 611 are generated by performing video analysis 610. Next, fade (fade) detection is performed using statistics 611 within a small window of several previous images to the current image. Each picture is assigned a state value 631, the state value 631 indicating whether the picture is in a NORMAL (NORMAL) state or a FADE (FADE) state. The state value is saved for each image. When encoding a picture, WP is used for a current picture and a reference picture pair if the current picture or one of the reference pictures of the current picture is in a FADE (FADE) state. In step 650, the statistics of the current image and the corresponding reference image are processed to estimate WP parameters. These parameters are then sent to the encoding engine 660. Otherwise, if the coding mode is in a normal state, normal coding is carried out.
As described in the article "Weighted prediction methods for improved motion compensation" by a.leontaris and a.m.tourapis (16 th IEEE International Image Processing, ICIP), 11 months 2009, egypt, page 1029-. For each macroblock partition block, the reference block is selected from the available reference lists (often denoted RefPicList in the specification), i.e., from reference list0 for P-coded slices or B-coded slices or reference list1 for B-coded slices. The reference picture used for each segment may be different. From these reference pictures, a prediction block is generated for each list, i.e. a prediction block P under unidirectional list prediction or a prediction block P under bidirectional prediction, using motion information optionally with sub-pixel precision O And P 1 . These prediction blocks may further process the availability of the current slice according to weighted prediction. For P slices, the WP parameters are transmitted in the slice header. For the B band, there are two options: for explicit WP, these parameters are in the slice headerCarrying out transmission; for implicit WP, these parameters are derived from the Picture Order Count (POC) indicated in the slice header. Only the case of explicit WP and how this approach is used to improve motion compensation performance will be focused on herein. Note that in HEVC and VVC, PB is used in a similar manner to macroblock partitions in AVC.
For an explicit unidirectional list WP or P slice in a B slice, the prediction block is derived from a single reference picture. Let P denote the sample values in the prediction block P. If no weighted prediction is used, the final inter prediction sample is f ═ p. Otherwise, the prediction samples are:
Figure BDA0003739688710000211
item w x And o x Indicating the WP gain and offset parameters of the reference list x, respectively. The term logWD is transmitted in the code stream and controls the mathematical precision of the weighted prediction process. When logWD ≧ 1, the above expression rounds off in the direction deviating from zero. Similarly, two prediction blocks should be considered in bi-prediction, one for each reference list. Let p be 0 And p 1 Representing two prediction blocks P 0 And P 1 Of (2). If weighted prediction is not used, the following prediction is performed:
f=(p 0 +p 1 +1)>>1
when bidirectional weighted prediction is performed, the following prediction is performed:
f=((p 0 ×w 0 +p 1 ×w 1 +2 logWD )>>(logWD+1))+((o 0 +o 1 +1)>>1)
it should be noted that the weighted prediction can compensate for luminance variations, such as fade-in, fade-out, or cross-fade.
In VVC, weighted prediction is indicated in the Sequence Parameter Set (SPS), Picture Parameter Set (PPS), and slice header at the higher level. In SPS, the following syntax elements are used:
-SPS _ weighted _ pred _ flag equal to 1 indicates that weighted prediction can be applied to P slices referring to said SPS; SPS _ weighted _ pred _ flag equal to 0 means that weighted prediction is not applied to P slices referring to the SPS.
-SPS _ weighted _ bipred _ flag equal to 1 indicates that explicit weighted prediction can be applied to B slices referring to said SPS; SPS _ weighted _ bipred _ flag equal to 0 means that explicit weighted prediction is not applied to B slices referring to the SPS.
In PPS, the following syntax elements are used:
-PPS _ weighted _ pred _ flag equal to 0 means that weighted prediction is not applied to P slices referring to said PPS; PPS _ weighted _ pred _ flag equal to 1 means that weighted prediction is applied to P slices referring to the PPS. When sps _ weighted _ pred _ flag is equal to 0, the value of pps _ weighted _ pred _ flag should be equal to 0.
-PPS _ weighted _ bipred _ flag equal to 0 means that explicit weighted prediction does not apply to B slices referring to said PPS; PPS _ weighted _ bipred _ flag equal to 1 means that explicit weighted prediction is applied to B slices referring to the PPS.
When sps _ weighted _ bipred _ flag is equal to 0, the value of pps _ weighted _ bipred _ flag should be equal to 0.
In the slice header, the weighted prediction parameters are indicated as pred _ weight _ table (), where the structure of pred _ weight _ table () is shown in table 1 and includes the following elements:
luma _ log2_ weight _ denom represents the base 2 logarithm of the denominator of all luminance weighting factors. The value of luma _ log2_ weight _ denom may range from 0 to 7 (including head and tail values).
delta _ chroma _ log2_ weight _ denom represents the difference of the base 2 logarithms of the denominators of all the chroma weighting factors. When delta _ chroma _ log2_ weight _ denom is not present, then delta _ chroma _ log2_ weight _ denom is inferred to be equal to 0.
The variable ChromaLog2weight denom, which is derived to be equal to luma _ log2_ weight _ denom + delta _ chroma _ log2_ weight _ denom, may range in value from 0 to 7 (including the leading and trailing values).
luma _ weight _ l0_ flag [ i ] equal to 1 indicates the presence of weighting factors for the luma components predicted from list0 by RefPicList [0] [ i ], and luma _ weight _ l0_ flag [ i ] equal to 0 indicates the absence of these weighting factors.
chroma _ weight _ l0_ flag [ i ] equal to 1 indicates the presence of weighting factors for chroma predictors for list0 prediction by RefPicList [0] [ i ], and chroma _ weight _ l0_ flag [ i ] equal to 0 indicates the absence of these weighting factors. When chroma _ weight _ l0_ flag [ i ] is not present, then chroma _ weight _ l0_ flag [ i ] is inferred to be equal to 0.
delta _ luma _ weight _ l0[ i ] represents the difference in weighting factors applied to luma predictors predicted for list0 by RefPicList [0] [ i ].
Variable LumaWeight L0[ i]Is derived to be equal to (1)<<luma_log2_weight_denom)+delta_luma_weight_l0[i]. When luma _ weight _ l0_ flag [ i [ ]]Equal to 1, delta _ luma _ weight _ l0[ i]May range from-128 to 127 (inclusive). When luma _ weight _ l0_ flag [ i [ ]]When equal to 0, then infer LumaWeight L0[ i]Is equal to 2 luma_log2_weight_denom
luma _ offset _ l0[ i ] represents an additional offset applied to luma predictors predicted from list0 by RefPicList [0] [ i ]. The value of luma _ offset _ l0[ i ] may range from-128 to 127 (including leading and trailing values). When luma _ weight _ l0_ flag [ i ] is equal to 0, then luma _ offset _ l0[ i ] is inferred to be equal to 0.
delta _ chroma _ weight _ l0[ i ] [ j ] represents the difference in weighting factors applied to chroma predictors for list0 prediction by RefPicList [0] [ i ], where j equals 0 for Cb and 1 for Cr.
Variable ChromaWeight L0[ i][j]Is derived to be equal to (1)<<ChromaLog2WeightDenom)+delta_chroma_weight_l0[i][j]. When chroma _ weight _ l0_ flag [ i [ ]]Equal to 1, delta _ chroma _ weight _ l0[ i][j]May range from-128 to 127 (inclusive). When chroma _ weight _ l0_ flag [ i [ ]]When equal to 0, infer ChromaWeight L0[ i][j]Is equal to 2 ChromaLog2WeightDenom
delta _ chroma _ offset _ l0[ i ] [ j ] represents the difference of the additional offset applied to the chroma predictor for list0 prediction by RefPicList [0] [ i ], where j equals 0 for Cb and 1 for Cr.
The variable ChromaOffsetl0[ i ] [ j ] is derived as follows:
ChromaOffsetL0[i][j]=Clip3(-128,127,(128+delta_chroma_offset_l0[i][j]-((128*ChromaWeightL0[i][j])>>ChromaLog2WeightDenom)))
delta _ chroma _ offset _ l0[ i ] [ j ] may range from-4 x 128 to 4 x 127 (including leading and trailing). When chroma _ weight _ l0_ flag [ i ] is equal to 0, chroma offset L0[ i ] [ j ] is inferred to be equal to 0.
Luma _ weight _ L1_ flag [ i ], chroma _ weight _ L1_ flag [ i ], delta _ Luma _ weight _ L1[ i ], Luma _ offset _ L1[ i ], delta _ chroma _ weight _ L1[ i ] [ j ] and delta _ chroma _ offset _ L1[ i ] [ j ] have the same semantics as Luma _ weight _ L0_ flag [ i ], chroma _ weight _ L0_ flag [ i ], delta _ Luma _ weight _ L0[ i ], Luma _ offset _ L0[ i ], delta _ chroma _ weight _ L0[ i ] [ j ] and delta _ chroma _ offset _ L0[ i ] [ j ], where L0, L38, L5838, and List0 are replaced by List0, respectively.
The variable sumWeightL0Flags is derived as equal to the sum of luma _ weight _ l0_ flag [ i ] +2 × chroma _ weight _ l0_ flag [ i ], where i is 0.. NumRefIdxActive [0] -1.
When slice _ type is equal to B, then the derived variable sumWeightL1Flags is equal to the sum of luma _ weight _ l1_ flag [ i ] +2 × chroma _ weight _ l1_ flag [ i ], where i is 0.. NumRefIdxActive [1] -1.
The requirement of code stream consistency is as follows: SumWeight L0Flags should be less than or equal to 24 when slice _ type is equal to P; when slice _ type is equal to B, the sum of sumWeightL0Flags and sumWeightL1Flags should be less than or equal to 24.
Table 1: weighted prediction parameter syntax
Figure BDA0003739688710000231
Figure BDA0003739688710000241
In document JVET-O0244 "AHG 17: regarding the zero delta POC (AHG17: On zero delta POC in reference picture structure) "(JVT conference 15, Godberg Sweden) in the reference picture structure, it is stated that: in the current draft VVC specification, a reference picture is indicated in a Reference Picture Structure (RPS), where abs _ delta _ POC _ st represents the delta POC value, which may be equal to 0. The RPS may indicate in SPS and slice header. Different weights need to be indicated for the same reference picture by this function, which may be needed if scalability of the hierarchy is supported in the access unit and the same POC value is used across the layers. It is noted that when weighted prediction is not enabled, no duplicate reference pictures need to be used. It is also proposed in this document that when weighted prediction is not enabled, the use of zero delta POC values is not allowed.
Table 2: sequence parameter set RBSP (raw byte sequence payload) syntax
Figure BDA0003739688710000242
Figure BDA0003739688710000251
Figure BDA0003739688710000261
Figure BDA0003739688710000271
Figure BDA0003739688710000281
Figure BDA0003739688710000291
Table 3: reference picture list structure syntax
Figure BDA0003739688710000301
The syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx) may exist in the SPS or may exist in the slice header. Depending on whether the syntax structure is contained in the slice header or in the SPS, the following applies:
-if a syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx) is present in the slice header, this syntax structure represents the reference picture list listIdx of the current picture (the picture comprising the slice).
Otherwise, if a syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx) is present in the SPS, this syntax structure represents a candidate for the reference picture list listIdx, and the term "current picture" in the semantics specified in the rest of this section refers to: (1) each picture comprising one or more slices, wherein a slice comprises ref _ pic _ list _ idx [ listIdx ] equal to an index in a list of syntax structures ref _ pic _ list _ struct (listIdx, rplsIdx) included in the SPS, (2) each picture in the CVS referring to the SPS.
num _ ref _ entries [ listIdx ] [ rplsIdx ] represents the number of entries in the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx). num _ ref _ entries [ listIdx ] [ rplsIdx ] can range from 0 to sps _ max _ dec _ pic _ buffering _ minus1+14 (including leading and trailing).
LTRP _ in _ slice _ header _ flag [ listIdx ] [ rplsIdx ] equal to 0 means that the POC LSB of the LTRP entry in the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx) is present in the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx), and LTRP _ in _ slice _ header _ flag [ listIdx ] [ rplsIdx ] equal to 1 means that the POC LSB of the LTRP entry in the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx) is not present in the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx).
inter _ layer _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] equal to 1 means that the ith entry in the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx) is an ILRP entry, and inter _ layer _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] equal to 0 means that the ith entry in the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx) is not an ILRP entry. When inter _ layer _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] is not present, it is inferred that the value of inter _ layer _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] is equal to 0.
st _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] equal to 1 indicates that the ith entry in the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx) is an STRP entry, and st _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] equal to 0 indicates that the ith entry in the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx) is an LTRP entry. When inter _ layer _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] is equal to 0 and st _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] is absent, then the value of st _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] is inferred to be equal to 1.
The variable NumLtrpEntries [ listIdx ] [ rplsIdx ] is derived as follows:
for(i=0,NumLtrpEntries[listIdx][rplsIdx]=0;i<num_ref_entries[listIdx][rplsIdx];i++)
if(!inter_layer_ref_pic_flag[listIdx][rplsIdx][i]&&!st_ref_pic_flag[listIdx][rplsIdx][i])
NumLtrpEntries[listIdx][rplsIdx]++
abs _ delta _ poc _ st [ listIdx ] [ rplsIdx ] [ i ] represents the value of the variable AbsDeltaPocSt [ listIdx ] [ rplsIdx ] [ i ] as follows:
Figure BDA0003739688710000311
the abs _ delta _ poc _ st [ listIdx ] [ rplsIdx ] [ i ] can range from 0 to 215-1 (including leading and trailing).
strp _ entry _ sign _ flag [ listIdx ] [ rplsIdx ] [ i ] equal to 1 means that the value of the ith entry in the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx) is greater than or equal to 0, and strp _ entry _ sign _ flag [ listIdx ] [ rplsIdx ] [ i ] equal to 0 means that the value of the ith entry in the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx) is less than 0. When strp _ entry _ sign _ flag [ listIdx ] [ rplsIdx ] [ i ] is not present, it is inferred that the value of strp _ entry _ sign _ flag [ listIdx ] [ rplsIdx ] [ i ] is equal to 1.
The list DeltaPocValSt [ listIdx ] [ rplsIdx ] is derived as follows:
Figure BDA0003739688710000312
rpls _ poc _ lsb _ lt [ listIdx ] [ rplsIdx ] [ i ] represents the value modulo MaxPicOrderCntLsb by the picture order number of the picture referenced by the ith entry in the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx). The syntax element rpls _ poc _ lsb _ lt [ listIdx ] [ rplsIdx ] [ i ] has a length of (log2_ max _ pic _ order _ cnt _ lsb _ minus4+4) bits.
ILRP _ idc [ listIdx ] [ rplsIdx ] [ i ] represents the index of ILRP to the i-th entry in the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx) in the directly related layer list. ilrp _ idc [ listIdx ] [ rplsIdx ] [ i ] can range from 0 to general Layeridx [ nuh _ layer _ id ] -1 (including leading and trailing values).
In table 2, the weighted prediction parameters are indicated after the reference picture list is indicated. In table 4, these syntax elements are reordered so that binarization of the delta POC syntax elements is restricted according to the value of the weighted predictor flag.
Table 4: modified sequence parameter set RBSP syntax
Figure BDA0003739688710000321
Figure BDA0003739688710000331
In addition, the value of the delta POC (i.e., the variable AbsDeltaPocSt) may be conditionally recovered at the decoding end as follows:
abs _ delta _ poc _ st [ listIdx ] [ rplsIdx ] [ i ] represents the value of the variable AbsDeltaPocSt [ listIdx ] [ rplsIdx ] [ i ], as follows:
Figure BDA0003739688710000332
the Triangular Partitioning Mode (TPM) and geometric motion partitioning (GEO) modes, also referred to as triangular fusion mode and geometric fusion mode, respectively, are partitioning techniques that implement non-horizontal and non-vertical boundaries between prediction partitions, where the prediction unit PU1 and the prediction unit PU2 are combined in a region by a weighted averaging process of their subsets of samples associated with different color components. The use of TMP enables the boundary between the prediction partitions to be along the diagonal of the rectangular block, while the boundary implemented with GEO can be located at an arbitrary position. In the region where the weighted averaging process is applied, the integer numbers within the squares represent the weights W applied to the luminance components of the prediction unit PU1 PU1 . In one example, the weight W applied to the luma component of the prediction unit PU2 PU2 The calculation is as follows:
W PU2 =8-W PU1
the weight applied to the chrominance component of the corresponding prediction unit may be different from the weight applied to the luminance component of the corresponding prediction unit.
The detailed information on TPM syntax is shown in table 5, where 4 syntax elements are used to indicate information on TPM:
MergeTriangleFlag is a flag identifying whether to select TPM or not (0' means not to select TPM; otherwise, means to select TPM);
merge _ triangle _ split _ dir is a TPM's partition direction flag ("0" represents the partition direction from top left corner to bottom right corner; otherwise, represents the partition direction from top right corner to bottom left corner);
merge _ triangle _ idx0 and merge _ triangle _ idx1 represent indices of fusion candidates 0 and 1 for the TPM.
Table 5: fused data syntax including syntax for a TPM
Figure BDA0003739688710000341
Figure BDA0003739688710000351
Specifically, TPM is proposed in the documents R-L.Liao and C.S.Lim, JVET-L0124 "CE10.3.1.b: triangle prediction unit mode (CE10.3.1.b: triangle prediction unit mode) "( JVET conference 12, 10 months 2018, Australia, China). GEO is in the following paper: s.esenlik, h.gao, a.filippov, v.rufitsky, a.m.kotra, b.wang, e.alshina, M.
Figure BDA0003739688710000352
Sauer, document JVET-O0489 "non-CE 4: geometrical partitioning for inter blocks (Non-CE4: Geometrical partitioning for inter blocks) "(JVT conference 15, 7 months 2019, Goodburg, Sweden).
The method of the present invention for coordinating the TPM and/or GEO with the WP is to disable the TPM and/or GEO when the WP is applied. First embodiment as shown in table 6, it can be implemented by checking whether the value of variable weightedPredFlag of coding unit (coding unit) is equal to 0.
The variable weightedPredFlag is derived as follows:
-if slice _ type is equal to P, then weightedPredFlag is set equal to pps _ weighted _ pred _ flag.
Otherwise, if slice _ type is equal to B), then weightedPredFlag is set equal to pps _ weighted _ bipred _ flag.
The slice-level weighted prediction process may switch between the picture level and the slice level by using syntax elements pps _ weighted _ pred _ flag and sps _ weighted _ pred _ flag, respectively.
As described above, the variable weightedPredFlag indicates whether or not slice-level weighted prediction can be used in obtaining inter-predicted samples for a slice.
Table 6: fusion data syntax for coordinating TPM and WP in the present invention
Figure BDA0003739688710000361
Figure BDA0003739688710000371
The ciip _ flag [ x0] [ y0] indicates whether the inter-frame intra joint fusion prediction mode is applied to the current coding unit. The array indices x0, y0 represent the position of the top left luminance sample of the considered coding block relative to the top left luminance sample of the image (x0, y 0).
When ciip _ flag [ x0] [ y0] is not present, then ciip _ flag [ x0] [ y0] is inferred as follows:
-if all the following conditions are true, then ciip _ flag [ x0] [ y0] is inferred to be equal to 1:
-sps _ ciip _ enabled _ flag is equal to 1;
-general _ merge _ flag [ x0] [ y0] equal to 1;
-merge _ sublock _ flag [ x0] [ y0] equal to 0;
-regular _ merge _ flag x0 y0 equal to 0;
-cbWidth less than 128;
-cbHeight less than 128;
-cbWidth cbHeight greater than or equal to 64.
-otherwise, conclude that ciip _ flag [ x0] [ y0] is equal to 0.
When ciip _ flag [ x0] [ y0] is equal to 1, the variable IntraPredModeY [ x ] [ y ] is set equal to INTRA _ plan, where x is x0., x0+ cbWidth-1, y is y0., y0+ cbHeight-1.
The variable MergeTriangleFlag [ x0] [ y0] representing whether triangle-based motion compensation is used to generate the prediction samples for the current coding unit is derived as follows when decoding B slices:
-MergeTriangleFlag [ x0] [ y0] is set equal to 1 if all of the following conditions are true:
-sps _ triangle _ enabled _ flag is equal to 1;
-slice _ type is equal to B;
-general _ merge _ flag [ x0] [ y0] equal to 1;
-maxnumtrianglemagecand greater than or equal to 2;
-cbWidth cbHeight greater than or equal to 64.
-regular _ merge _ flag [ x0] [ y0] equal to 0;
-merge _ sublock _ flag [ x0] [ y0] is equal to 0;
-ciip _ flag [ x0] [ y0] equal to 0;
-weightedPredFlag equals 0.
-otherwise, MergeTriangleFlag [ x0] [ y0] is set equal to 0.
A second example is shown in table 7. If weightedPredFlag is equal to 1, the syntax element max _ num _ merge _ cand _ minus _ max _ num _ triangle _ cand is not present and it is inferred that this syntax element has a value such that MaxNuMtriglangeMergeCand is less than 2.
Table 7: universal stripe header syntax for coordinating TPM and WP in the present invention
Figure BDA0003739688710000381
Figure BDA0003739688710000391
Figure BDA0003739688710000401
Figure BDA0003739688710000411
Figure BDA0003739688710000421
Figure BDA0003739688710000431
In particular, the following semantics may be used for the second embodiment:
max _ num _ merge _ cand _ minus _ max _ num _ triangle _ cand represents the maximum number of triangle merge mode candidates supported in a stripe subtracted from MaxNumMergeCand.
When max _ num _ merge _ scan _ minus _ max _ num _ triangle _ scan is not present, sps _ triangle _ enabled _ flag is equal to 1, slice _ type is equal to B, weightedPredFlag is equal to 0, and maxnummeregacand is greater than or equal to 2, then max _ num _ merge _ scan _ minus _ max _ triangle _ scan _ minus _ max _ num _ triangle _ scan _ minus1+1 is inferred.
When max _ num _ merge _ cand _ minus _ max _ num _ triangle _ scan is not present, sps _ triangle _ enabled _ flag is equal to 1, slice _ type is equal to B, weightedPredFlag is equal to 1, and maxnummeregand is greater than or equal to 2, then max _ num _ merge _ scan _ minus _ max _ num _ triangle _ scan is inferred to be equal to maxnummeregand or maxnummeregand-1.
The maximum number of triangle fusion pattern candidates maxnumtriangecand is derived as follows:
MaxNumTriangleMergeCand=MaxNumMergeCand-max_num_merge_cand_minus_max_num_triangle_cand
when max _ num _ merge _ cand _ minus _ max _ num _ triangle _ cand is present, the maxnumtriangmagengecand can range from 2 to MaxNumMergeCand (inclusive).
When max _ num _ merge _ cand _ minus _ max _ num _ triangle _ cand is not present and sps _ triangle _ enabled _ flag is equal to 0, or MaxNumMergeCand is less than 2, maxnumtriangecand is set equal to 0.
When maxnumtrianglemagecand is equal to 0, triangle merge mode is not allowed for the current stripe.
The mechanism disclosed by the invention is not only applicable to TPM and GEO, but also applicable to other non-rectangular prediction and segmentation modes, such as an intra-frame and inter-frame joint prediction mode based on triangulation.
Since the TPM and GEO are applied only to the B stripe, the variable weighted PredFlag in the above embodiment can be directly replaced with the variable pps _ weighted _ bipred _ flag.
The third embodiment can be implemented by checking whether the value of the variable weightedPredFlag of the coding unit is equal to 0, as shown in Table 6.
The variable weightedPredFlag is derived as follows:
-weight edpredflag is set to 0 if all of the following conditions are true:
luma _ weight _ l0_ flag [ i ] is equal to 0, where i ranges from 0 to NumRefIdxActive [0 ];
luma _ weight _ l1_ flag [ i ] equal to 0, where i ranges from 0 to NumRefIdxActive [1 ];
chroma _ weight _ l0_ flag [ i ] is equal to 0, where i ranges from 0 to NumRefIdxActive [0 ];
chroma _ weight _ l1_ flag [ i ] is equal to 0, where i ranges from 0 to NumRefIdxActive [1 ].
Else, weightedPredFlag is set to 1.
The derivation process of weightedPredFlag shows: if all the weighting flags of the luminance component and the chrominance component and all the weighting flags of all the reference indexes of the current slice are 0, the weighted prediction is forbidden to the current slice; otherwise, weighted prediction may be used for the current slice.
As described above, the variable weightedPredFlag indicates whether or not slice-level weighted prediction can be used in obtaining inter-predicted samples for a slice.
A fourth embodiment is shown in table 6, where weightedPredFlag is replaced with slice _ weighted _ pred _ flag, which is indicated in the slice header, as shown in table 8.
As described above, the syntax element syntax slice _ weighted _ pred _ flag indicates whether slice-level weighted prediction can be used when obtaining inter prediction samples of a slice.
Table 8: generic slice header syntax for indicating slice-level weighted prediction flags in the present invention
Figure BDA0003739688710000441
Figure BDA0003739688710000451
Figure BDA0003739688710000461
Figure BDA0003739688710000471
Figure BDA0003739688710000481
Figure BDA0003739688710000491
In particular, the following semantics may be used for the fourth embodiment:
slice _ weighted _ pred _ flag equal to 0 means that weighted prediction is not applied to the current slice, and slice _ weighted _ pred _ flag equal to 1 means that weighted prediction is applied to the current slice. When slice _ weighted _ pred _ flag does not exist, the value of slice _ weighted _ pred _ flag is inferred to be 0.
A fifth embodiment is to disable the block level TPM through coherency constraints. For TPM coded blocks, inter-frame prediction value P 0 710 and P 1 720 (shown in fig. 7) may not be present for the corresponding weighting factors for the luma component and the chroma components of the reference picture.
For more detail, refIdxA and predlistflag a represent inter prediction values P, respectively 0 Reference index and reference picture list of; refIdxB and predListFlagB respectively represent inter-frame prediction values P 1 And a reference picture list.
The variables lumaWeightedFlag and chromoweightedFlag are derived as follows:
lumaWeightedFlagA=predListFlagAluma_weight_l1_flag[refIdxA]:luma_weight_l0_flag[refIdxA]
lumaWeightedFlagB=predListFlagBluma_weight_l1_flag[refIdxB]:luma_weight_l0_flag[refIdxB]
chromaWeightedFlagA=predListFlagAchroma_weight_l1_flag[refIdxA]:chroma_weight_l0_flag[refIdxA]
chromaWeightedFlagB=predListFlagBchroma_weight_l1_flag[refIdxB]:chroma_weight_l0_flag[refIdxB]
lumaWeightedFlag=lumaWeightedFlagA||lumaWeightedFlagB
chromaWeightedFlag=chromaWeightedFlagA||chromaWeightedFlagB
the requirement of code stream consistency is as follows: the lumaWeiightedFlag and chromomaWeiightedFlag may be equal to 0.
A sixth embodiment disables the hybrid weighted sample prediction process for TPM coding blocks when explicit weighted prediction is used.
FIGS. 7 and 8 illustrate the basic ideas of TPM and GEO, respectively, in the present invention. It should be noted that embodiments of the TPM may also be used in GEO mode.
For TPM coding blocks, if the inter-frame prediction value P 0 710 or P 1 720 the existence of a weighting factor for the luma component or chroma component of the corresponding reference picture, the use of the WP parameter (P) based on 0 And P 1 WP parameter 730{ w } 0 ,O 0 And WP parameter 740{ w } 1 ,O 1 }) to generate an inter prediction value for the block; otherwise, an inter prediction value of the block 750 is generated using a weighting process based on the hybrid weighting parameter. As shown in fig. 9, the inter prediction value 901 requires two prediction values P 0 911 and P 1 912 having an overlap region 921, wherein non-zero weights are applied to the blocks 911 and 912 to partially blend the prediction value P 0 911 and P 1 912. In fig. 9, blocks adjacent to block 901 are denoted as blocks 931, 932, 933, 934, 935, and 936. FIG. 8 illustrates the difference between the TPM fusion mode and the GEO fusion mode. When the GEO fusion mode is used, the overlapping region between the predictors 851 and 852 may not only be located on the diagonal of the inter prediction block 850. The prediction value P may be received by copying the blocks 810 and 820 from other pictures 0 851 and P 1 852, and weights and offsets w 0 ,O 0 830 and w 1 ,O 1 }840 may be divided intoWith or without other applications to blocks 810 and 820.
For more detail, refIdxA and predListFlaga represent inter prediction values P, respectively 0 Reference index and reference picture list of; refIdxB and predListFlagB respectively represent inter-frame prediction values P 1 And a reference picture list.
The variables lumaWeiightedFlag and chromomaWeiightedFlag are derived as follows:
lumaWeightedFlagA=predListFlagAluma_weight_l1_flag[refIdxA]:luma_weight_l0_flag[refIdxA]
lumaWeightedFlagB=predListFlagBluma_weight_l1_flag[refIdxB]:luma_weight_l0_flag[refIdxB]
chromaWeightedFlagA=predListFlagAchroma_weight_l1_flag[refIdxA]:chroma_weight_l0_flag[refIdxA]
chromaWeightedFlagB=predListFlagBchroma_weight_l1_flag[refIdxB]:chroma_weight_l0_flag[refIdxB]
lumaWeightedFlag=lumaWeightedFlagA||lumaWeightedFlagB
chromaWeightedFlag=chromaWeightedFlagA||chromaWeightedFlagB
then, if the lumaWeightedFlag is true, an explicit weighting procedure is invoked; if the lumaWeiightedFlag is false, the mix weight process is invoked. In addition, the chroma component is also determined from chromaWeightedFlag.
In an alternative embodiment, the weighting flags are considered to be used in common for all components. If either the lumaWeightedFlag or the chromomaWeightedFlag is true, then an explicit weighting process is invoked; if both the lumaWeightedFlag and the chromoweightedFalg are false, the blending weighting process is invoked.
The explicit weighting process for rectangular blocks predicted using the bi-directional prediction mechanism is performed as follows:
the inputs to the process include:
two variables nCbW and nCbH, representing the width and height of the current coding block;
-two (nCbW) x (nCbH) arrays predSamplesA and predSamplesB;
-prediction list flags predlistflag a and predlistflag b;
-reference indices refIdxA and refIdxB;
-a variable cIdx representing a color component index;
-a sample bit depth bitDepth.
The output of this process includes an (nCbW) × (nCbH) array pbSamples composed of predicted sample values.
The variable shift1 is set equal to Max (2, 14-bitDepth).
The variables log2Wd, o0, o1, w0 and w1 are derived as follows:
if the cIdx of the luminance sample is equal to 0, then the following applies:
log2Wd=luma_log2_weight_denom+shift1
w0=predListFlagALumaWeightL1[refIdxA]:LumaWeightL0[refIdxA]
w1=predListFlagB?LumaWeightL1[refIdxB]:LumaWeightL0[refIdxB]
o0=(predListFlagAluma_offset_l1[refIdxA]:luma_offset_l0[refIdxA])<<(BitDepth Y -8)
o1=(predListFlagBluma_offset_l1[refIdxB]:luma_offset_l0[refIdxB])<<(BitDepth Y -8)
otherwise, if the cIdx of the chroma samples is not equal to 0, the following applies:
log2Wd=ChromaLog2WeightDenom+shift1
w0=predListFlagAChromaWeightL1[refIdxA][cIdx-1]:ChromaWeightL0[refIdxA][cIdx-1]
w1=predListFlagAChromaWeightL1[refIdxB][cIdx-1]:ChromaWeightL0[refIdxB][cIdx-1]
o0=(predListFlagAChromaOffsetL1[refIdxA][cIdx-1]:ChromaOffsetL0[refIdxA][cIdx-1])<<(BitDepth C -8)
o1=(predListFlagBChromaOffsetL1[refIdxB][cIdx-1]:ChromaOffsetL0[refIdxB][cIdx-1])<<(BitDepth C -8)
the prediction sample pbSamples [ x ] [ y ] is derived as follows, where x ═ 0.. nCbW-1, y ═ 0.. nCbH-1:
pbSamples[x][y]=Clip3(0,(1<<bitDepth)-1,(predSamplesA[x][y]*w0+predSamplesB[x][y]*w1+((o0+o1+1)<<log2Wd))>>(log2Wd+1))
the parameters of the slice-level weighted prediction can be represented as a set of variables that are assigned to each element in the reference picture list. The index of each element is further denoted as "i". These parameters may include:
-LumaWeightL0[i];
luma _ offset _ l0[ i ] represents an additional offset applied to luma predictors for list0 prediction by RefPicList [0] [ i ]. The value of luma _ offset _ l0[ i ] may range from-128 to 127 (including leading and trailing values). When luma _ weight _ l0_ flag [ i ] is equal to 0, then it is inferred that luma _ offset _ l0[ i ] is equal to 0.
Variable LumaWeight L0[ i]Is derived to be equal to (1)<<luma_log2_weight_denom)+delta_luma_weight_l0[i]. When luma _ weight _ l0_ flag [ i [ ]]Equal to 1, delta _ luma _ weight _ l0[ i [ ]]May range from-128 to 127 (inclusive). When luma _ weight _ l0_ flag [ i [ ]]When equal to 0, then infer LumaWeight L0[ i]Is equal to 2 luma_log2_weight_denom
The hybrid weighting process for a rectangular block predicted using the bi-directional prediction mechanism is performed as follows:
the inputs to the process include:
two variables nCbW and nCbH, representing the width and height of the current coding block;
-two (nCbW) x (nCbH) arrays predSamplesLA and predSamplesLB;
-a variable triangleDir representing the segmentation direction;
the variable cIdx, representing the color component index.
The output of this process includes an (nCbW) × (nCbH) array pbSamples composed of predicted sample values.
The variable nCbR is derived as follows:
nCbR=(nCbW>nCbH)?(nCbW/nCbH):(nCbH/nCbW)
the variable bitDepth is derived as follows:
-if cIdx equals 0, bitDepth is set equal to BitDepth Y
Else, bitDepth is set equal to bitDepth C
The variables shift1 and offset1 are derived as follows:
variable shift1 is set equal to Max (5, 17-bitDepth);
the variable offset1 is set equal to 1< < (shift 1-1).
From the values of triangleDir, wS and cIdx, the prediction sample pbSamples [ x ] [ y ] is derived as follows, where x ═ 0.. nCbW-1, y ═ 0.. nCbH-1:
the variable wIdx is derived as follows:
if cIdx equals 0 and triangleDir equals 0, then the following applies:
wIdx=(nCbW>nCbH)?(Clip3(0,8,(x/nCbR-y)+4)):(Clip3(0,8,(x-y/nCbR)+4))
otherwise, if cIdx equals 0 and triangleDir equals 1, then the following applies:
wIdx=(nCbW>nCbH)?(Clip3(0,8,(nCbH-1-x/nCbR-y)+4)):(Clip3(0,8,(nCbW-1-x-y/nCbR)+4))
otherwise, if cIdx is greater than 0 and triangleDir is equal to 0, then the following applies:
wIdx=(nCbW>nCbH)?(Clip3(0,4,(x/nCbR-y)+2)):(Clip3(0,4,(x-y/nCbR)+2))
otherwise, if cIdx is greater than 0 and triangleDir is equal to 1, then the following applies:
wIdx=(nCbW>nCbH)?(Clip3(0,4,(nCbH-1-x/nCbR-y)+2)):(Clip3(0,4,(nCbW-1-x-y/nCbR)+2))
the variable wValue representing the weight of the prediction sample is derived from wIdx and cIdx as follows:
wValue=(cIdx==0)?Clip3(0,8,wIdx):Clip3(0,8,wIdx*2)
the prediction sample values are derived as follows:
pbSamples[x][y]=Clip3(0,(1<<bitDepth)-1,(predSamplesLA[x][y]*wValue+predSamplesLB[x][y]*(8-wValue)+offset1)>>shift1)
for geometric modes, the hybrid weighting process for rectangular blocks predicted using the bi-prediction mechanism is performed as follows:
the inputs to the process include:
two variables nCbW and nCbH, representing the width and height of the current coding block;
two (nCbW) × (nCbH) arrays predSamplesLA and predSamplesLB;
-variable angleIdx representing the angular index of the geometric segmentation;
-a variable distanceIdx representing a distance index of the geometric segmentation;
the variable cIdx, representing the color component index.
The output of this process includes an (nCbW) × (nCbH) array pbSamples composed of predicted sample values and a variable partIdx.
The variable bitDepth is derived as follows:
-if cIdx equals 0, bitDepth is set equal to bitDepth Y
Else, bitDepth is set equal to bitDepth C
The variables shift1 and offset1 are derived as follows:
variable shift1 is set equal to Max (5, 17-bitDepth);
the variable offset1 is set equal to 1< < (shift 1-1).
Weight array of luminance sampleWeight L [x][y]And chroma weight array sampleWeight C [x][y]The derivation is as follows, where x is 0.. nCbW-1, y is 0.. nCbH-1:
the values of the following variables are set as follows:
-hwRatio is set to nCbH/nCbW;
-displacex is set to angleIdx;
-displaseY is set to (displaseX + 8)% 32
-partIdx is set as angle idx > -13 & & angle idx < -271: 0;
rho is set to the following value by a look-up table denoted Dis (as shown in tables 8-12):
rho=(Dis[displacementX]<<8)+(Dis[displacementY]<<8)
if one of the following conditions is true, the variable shiftHor is set equal to 0:
angleIdx% 16 equals 8;
the angleIdx% 16 is not equal to 0 and hwRatio is greater than or equal to 1.
Otherwise, shiftHor is set equal to 1.
If shiftHor is equal to 0, then offsetX and offsetY are derived as follows:
offsetX=(256-nCbW)>>1
offsetY=(256-nCbH)>>1+angleIdx<16?(distanceIdx*nCbH)>>3:-((distanceIdx*nCbH)>>3)
otherwise, if shiftHor is equal to 1, then offsetX and offsetY are derived as follows:
offsetX=(256-nCbW)>>1+angleIdx<16?(distanceIdx*nCbW)>>3:-((distanceIdx*nCbW)>>3)
offsetY=(256-nCbH)>>1
the variables weightIdx and weightIdxAbs are calculated by a look-up table (table 9) as follows, where x is 0.. nCbW-1 and y is 0.. nCbH-1:
weightIdx=(((x+offsetX)<<1)+1)*Dis[displacementX]+(((y+offsetY)<<1)+1))*Dis[displacementY]-rho
weightIdxAbs=Clip3(0,26,abs(weightIdx))
sampleWeight L [x][y]the values of (a) are set as follows according to table 10 denoted GeoFilter, where x is 0.. nCbW-1 and y is 0.. nCbH-1:
sampleWeight L [x][y]=weightIdx<=0GeoFilter[weightIdxAbs]:8-GeoFilter[weightIdxAbs]
sampleWeight C [x][y]the values of (c) are set as follows, x being 0.. nCbW-1, y being 0.. nCbH-1:
sampleWeight C [x][y]=sampleWeight L [(x<<(SubWidthC-1))][(y<<(SubHeightC-1))]
description of the drawings: sample sampleWeight L [x][y]Can also be based on sampleWeight L [x-shiftX][y-shiftY]And (5) deducing. If the angleIdx is greater than 4 and less than 12, or the angleIdx is greater than 20 and less than 24, shiftX is equal to the tangent of the division angle and shiftY is equal to 1; otherwise, shiftX equals 1 and shiftY equals the cotangent of the division angle. If the tangent (relative to cotangent) value is infinity, shiftX equals 1 (relative to 0) or shiftY equals 0 (relative to 1).
The prediction sample values are derived as follows, where X is denoted L or C, cIdx is equal to 0 or not equal to 0:
pbSamples[x][y]=partIdxClip3(0,(1<<bitDepth)-1,(predSamplesLA[x][y]*(8-sampleWeight X [x][y])+predSamplesLB[x][y]*sampleWeight X [x][y]+offset1)>>shift1):Clip3(0,(1<<bitDepth)-1,(predSamplesLA[x][y]*sampleWeight X [x][y]+predSamplesLB[x][y]*(8-sampleWeight X [x][y])+offset1)>>shift1)
table 9: lookup table Dis for deriving geometric segmentation distances
idx 0 1 2 4 6 7 8 9 10 12 14 15
Dis[idx] 8 8 8 8 4 2 0 -2 -4 -8 -8 -8
idx 16 17 18 20 22 23 24 25 26 28 30 31
Dis[idx] -8 -8 -8 -8 -4 -2 0 2 4 8 8 8
Table 10: filter weight look-up table GeoFilter for deriving geometric partition filter weights
idx 0 1 2 3 4 5 6 7 8 9 10 11 12 13
GeoFilter[idx] 4 4 4 4 5 5 5 5 5 5 5 6 6 6
idx 14 15 16 17 18 19 20 21 22 23 24 25 26
GeoFilter[idx] 6 6 6 6 7 7 7 7 7 7 7 7 8
In the VVC specification Draft 7 (document JVT-P2001-vE: B.Bross, J.Chen, S.Liu, and Y. -K.Wang, output document JVT-P2001 "Universal Video Coding (Draft 7)" (16 th time the JVT conference, Switzerland Innova), which is contained in the file JVT-P2001-v 14: http:// phenix.it-subset. eu/JVET/doc _ end _ user/documents/16_ Geneva/wg 11/JVT-P2001-v14. zip), the concept of a Picture Header (PH) is introduced, thereby shifting part of the syntax elements from the slice header (PH) to the PH to reduce the signaling overhead due to the same or equal syntax value assignment for the same syntax element in each associated PH. As shown in table 11, the syntax element for controlling the maximum number of fusion candidates for the TPM fusion mode is indicated in PH, while the weighted prediction parameters are still indicated in SH as shown in tables 12 and 14. The semantics of the syntax elements used in tables 12 and 13 are described below:
table 11: header RBSP syntax
Figure BDA0003739688710000551
Image header RBSP semantics
The PH includes information common to all slices of a coded picture (coded picture) associated with the PH.
A non _ reference _ picture _ flag equal to 1 indicates that a picture associated with PH is never used as a reference picture, and a non _ reference _ picture _ flag equal to 0 indicates that a picture associated with PH may or may not be used as a reference picture.
GDR _ pic _ flag equal to 1 indicates that the picture associated with PH is a Gradual Decoding Refresh (GDR) picture, and GDR _ pic _ flag equal to 0 indicates that the picture associated with PH is not a GDR picture.
If a Coded Layer Video Sequence Start (CLVSS) picture that is not the first picture in the bitstream is decoded, then a no _ output _ of _ prior _ pics _ flag affects the output of previously decoded pictures in the Decoded Picture Buffer (DPB).
recovery _ poc _ cnt indicates a recovery point of a decoded picture in output order. If the current picture is a GDR picture associated with PH and there is a picture picA in the Coded Layer Video Sequence (CLVS) that follows the current GDR picture in decoding order and the PicOrderCntVal of the picture picA is equal to the value of PicOrderCntVal plus recovery _ poc _ cnt of the current GDR picture, the picture picA is called a recovery point picture. Otherwise, the first picture in output order for which PicOrderCntVal is greater than the value of PicOrderCntVal plus recovery _ poc _ cnt for the current picture is referred to as the recovery point picture. The recovery point pictures may not precede the current GDR picture in decoding order. The value of recovery _ poc _ cnt may range from 0 to MaxPicOrderCntLsb-1 (including the leading and trailing values).
Description 1: when GDR _ enabled _ flag is equal to 1 and PicOrderCntVal of the current picture is greater than or equal to RpPicOrderCntVal of the associated GDR picture, the current decoded picture and subsequent decoded pictures in output order are exactly matched with corresponding pictures resulting from performing a decoding process starting from an Intra Random Access Point (IRAP) picture (which is located before the associated GDR picture in decoding order if the picture exists).
ph _ pic _ parameter _ set _ id denotes the value of PPS _ pic _ parameter _ set _ id of the PPS currently in use. The value of ph _ pic _ parameter _ set _ id may range from 0 to 63 (including the leading and trailing values).
The requirement of code stream consistency is as follows: the value of temporalld of PH should be greater than or equal to the value of temporalld of a Picture Parameter Set (PPS) with PPS _ pic _ Parameter _ Set _ id equal to PH _ pic _ Parameter _ Set _ id.
An SPS _ poc _ msb _ flag equal to 1 indicates that a syntax element PH _ poc _ msb _ cycle _ present _ flag is present in a PH of a reference Sequence Parameter Set (SPS), and an SPS _ poc _ msb _ flag equal to 0 indicates that the syntax element PH _ poc _ msb _ cycle _ present _ flag is not present in the PH of the reference SPS.
A PH _ poc _ msb _ present _ flag equal to 1 means that a syntax element poc _ msb _ val exists in the PH, and a PH _ poc _ msb _ present _ flag equal to 0 means that the syntax element poc _ msb _ val does not exist in the PH. When vps _ independent _ layer _ flag [ general layer idx [ nuh _ layer _ id ] ] is equal to 0 and a picture is present in the current Access Unit (AU) in the reference layer of the current layer, the value of ph _ poc _ msb _ present _ flag may be equal to 0.
The POC _ MSB _ val represents a value of a Most Significant Bit (MSB) of a picture order number (POC) of the current picture. The syntax element poc _ msb _ val is (poc _ msb _ len _ minus1+1) bits in length.
The sps _ triangle _ enabled _ flag indicates whether or not the triangle-based motion compensation can be used for inter prediction. The sps _ triangle _ enabled _ flag equal to 0 indicates that the syntax may be constrained such that triangle-based motion compensation is not used in a Coded Layer Video Sequence (CLVS) and the merge _ triangle _ split _ dir, merge _ triangle _ idx0, and merge _ triangle _ idx1 are not present in a coding unit (coding unit) syntax of the CLVS, the sps _ triangle _ enabled _ flag equal to 1 indicates that triangle-based motion compensation may be used in the CLVS.
PPS _ max _ num _ merge _ cand _ minus _ max _ num _ trim _ triangle _ cand _ plus1 being equal to 0 means that pic _ max _ num _ merge _ cand _ minus _ max _ num _ triangle _ cand is present in the PH of a slice of the reference Picture Parameter Set (PPS), PPS _ max _ num _ merge _ cand _ plus1 being greater than 0 means that pic _ max _ num _ merge _ minus _ max _ num _ triangle _ cand is not present in the PH of the reference PPS. pps _ max _ num _ merge _ cand _ minus _ max _ num _ triangle _ cand _ plus1 can range from 0 to maxnummeregacand-1.
PPS _ max _ num _ merge _ cand _ minus _ max _ num _ triangle _ scan _ plus1 being equal to 0 means that pic _ max _ num _ merge _ scan _ minus _ max _ num _ merge _ scan 1 being greater than 0 means that pic _ max _ num _ merge _ scan _ minus _ max _ num _ scan _ plus1 is not present in the PH of the reference PPS. pps _ max _ num _ merge _ cand _ minus _ max _ num _ triangle _ cand _ plus1 can range from 0 to maxnumMergecand-1.
pic _ six _ minus _ max _ num _ merge _ cand denotes the maximum number of Motion Vector Prediction (MVP) candidates supported in the slice associated with PH subtracted from 6. The maximum number of fused MVP candidates MaxNumMergeCand is derived as follows:
MaxNumMergeCand=6-picsix_minus_max_num_merge_cand
MaxNumMergeCand may range from 1 to 6 (inclusive). When pic _ six _ minus _ max _ num _ merge _ cand is not present, then the value of pic _ six _ minus _ max _ num _ merge _ cand is inferred to be equal to pps _ six _ minus _ max _ num _ merge _ scan _ plus 1-1.
Table 12: universal stripe header grammar
Figure BDA0003739688710000571
Figure BDA0003739688710000581
Universal stripe header semantics
When the slice header syntax element slice _ pic _ order _ cnt _ lsb exists, the value of the syntax element is the same for all slice headers of the coded picture.
The variable CuQpDeltaVal represents the difference between the luma quantization parameter of the coding unit including cu _ qp _ delta _ abs and its prediction parameter, and is set equal to 0. Variable CuQpOffset Cb 、CuQpOffset Cr And CuQpOffset CbCr Are respectively expressed as encoding units including cu _ chroma _ Qp _ offset _ flag to determine quantization parameters Qp' Cb 、Qp′ Cr And Qp' CbCr The values of (c) are used, and these variables are all set equal to 0.
slice _ pic _ order _ cnt _ lsb indicates that the picture order number of the current picture modulo MaxPicOrderCntLsb. The syntax element slice _ pic _ order _ cnt _ lsb has a length of (log2_ max _ pic _ order _ cnt _ lsb _ minus4+4) bits. The slice _ pic _ order _ cnt _ lsb can range from 0 to MaxPicOrderCntLsb-1 (including leading and trailing values).
When the current picture is a GDR picture, the variable RpPicOrderCntVal is derived as follows:
RpPicOrderCntVal=PicOrderCntVal+recovery_poc_cnt
the slice _ sub _ id represents a sub-picture identifier of a sub-picture including a slice. If slice _ subpbic _ id exists, the value of the variable subpicIdx is derived so that subpicIdList [ subpicIdx ] equals slice _ subpic _ id. Otherwise, if slice _ subppic _ id does not exist, the derived variable SubPicIdx is equal to 0. The length (in bits) of slice _ subpac _ id is derived as follows:
-if sps _ sub _ id _ signalling _ present _ flag is equal to 1, then the length of slice _ sub _ id is equal to sps _ sub _ id _ len _ minus1+ 1.
Otherwise, if ph _ subacid _ signalling _ present _ flag is equal to 1, then the length of slice _ subacid _ id is equal to ph _ subacid _ len _ minus1+ 1.
Otherwise, if pps _ subacid _ signalling _ present _ flag is equal to 1, the length of slice _ subacid _ id is equal to pps _ subacid _ len _ minus1+ 1.
Else, the length of slice _ subapic _ id is equal to Ceil (Log2(sps _ num _ subapics _ minus1+ 1)).
slice _ address represents the slice address of the slice. When slice _ address does not exist, it is inferred that the value of slice _ address is equal to 0.
If rect _ slice _ flag is equal to 0, then the following applies:
the stripe address is the index of the raster scan partition.
The length of slice _ address is Ceil (Log2 (numtiesinpic)) bits.
The slice _ address may range from 0 to numtiesnic-1 (including the beginning and end values).
Otherwise, if rect _ slice _ flag is equal to 1, then the following applies:
the slice address is the slice index of the slice within the SubPicIdx sub-image.
The length of the slice _ address is Ceil (Log2 (NumSileseInSubpic [ SubPicIdx ])) bits.
The value of the slice address can range from 0 to NumSilesinSubpic [ SubPicIdx ] -1 (including the head and tail values).
The requirement of code stream consistency is that the following constraint conditions are applicable:
-if rect _ slice _ flag is equal to 0 or sub _ present _ flag is equal to 0, then the value of slice _ address may not be equal to the value of slice _ address of any other coded slice (coded slice) Network Abstraction Layer (NAL) unit of the same coded picture.
Otherwise, the pair of values of slice _ sub _ id and slice _ address may not be equal to the pair of values of slice _ sub _ id and slice _ address of any other coded slice NAL unit of the same coded picture.
When rect _ slice _ flag is equal to 0, slices in the picture may be arranged in ascending order according to their slice _ address values.
The shape of the slices in the picture may be such that each Coding Tree Unit (CTU), when decoded, may have its entire left boundary and its entire upper boundary composed of picture boundaries or of the boundaries of the previously decoded CTU or CTUs.
num _ tiles _ in _ slice _ minus1+1 (when present) represents the number of partitions in a slice. The num _ tiles _ in _ slice _ minus1 can range from 0 to NumTilesInPic-1 (including head-to-tail).
A variable numtuincurrslice represents the number of CTUs in the current slice, a list ctbdradlncurrslice [ i ] (where i ranges from 0 to numtulncurrslice-1 (including head-to-tail values)) represents the image raster scan address of the ith Coding Tree Block (CTB) within the slice, and the variables numtulncurrslice and the list ctbdradlncurrslice [ i ] are derived as follows:
Figure BDA0003739688710000601
the variables SubPicLeftBoundaryPos, SubPicTopBoundaryPos, SubPicRicBoundaryPos, and SubPicBotBotBoudaryPos were derived as follows:
Figure BDA0003739688710000602
slice _ type represents the coding type of the slice, as shown in table 13.
Table 13: association relation between name and slice _ type
Figure BDA0003739688710000603
Figure BDA0003739688710000611
slice _ rpl _ SPS _ flag [ i ] equal to 1 indicates that the reference picture list i of the current slice is derived from one of the syntax structures ref _ pic _ list _ struct (listIdx, rplsIdx) in which listIdx is equal to i in the SPS, and slice _ rpl _ SPS _ flag [ i ] equal to 0 indicates that the reference picture list i of the current slice is derived from the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx) in which listIdx is equal to i directly included in the slice header of the current picture.
When slice _ rpl _ sps _ flag [ i ] is not present, the following applies:
-if pic _ rpl _ present _ flag is equal to 1, deducing that the value of slice _ rpl _ sps _ flag [ i ] is equal to pic _ rpl _ sps _ flag [ i ].
Otherwise, if num _ ref _ pic _ lists _ in _ sps [ i ] is equal to 0, the value of ref _ pic _ list _ sps _ flag [ i ] is inferred to be equal to 0.
-otherwise, if num _ ref _ pic _ lists _ in _ sps [ i ] is greater than 0 and rpl1_ idx _ present _ flag is equal to 0, then the value of slice _ rpl _ sps _ flag [1] is inferred to be equal to slice _ rpl _ sps _ flag [0 ].
slice _ rpl _ idx [ i ] represents the index of the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx) used to derive the reference picture list i for the current picture to be equal to i in the list of syntax structures ref _ pic _ list _ struct (listIdx, rplsIdx) included in the SPS where listIdx is equal to i. The syntax element slice _ rpl _ idx [ i ] is represented by Ceil (Log2(num _ ref _ pic _ lists _ in _ sps [ i ])) bits. When slice _ rpl _ idx [ i ] is not present, then the value of slice _ rpl _ idx [ i ] is inferred to be equal to 0. slice _ rpl _ idx [ i ] may range from 0 to num _ ref _ pic _ lists _ in _ sps [ i ] -1 (including leading and trailing values). If slice _ rpl _ sps _ flag [ i ] is equal to 1 and num _ ref _ pic _ lists _ in _ sps [ i ] is equal to 1, then the value of slice _ rpl _ idx [ i ] is inferred to be equal to 0. If slice _ rpl _ sps _ flag [ i ] is equal to 1 and rpl1_ idx _ present _ flag is equal to 0, then the value of slice _ rpl _ idx [1] is inferred to be equal to slice _ rpl _ idx [0 ].
The variable RlsIdx [ i ] is derived as follows:
Figure BDA0003739688710000612
slice _ poc _ lsb _ lt [ i ] [ j ] represents the value of the image sequence number of the jth LTRP entry in the ith reference picture list modulo MaxPicOrderCntLsb. The syntax element slice _ poc _ lsb _ lt [ i ] [ j ] has a length of (log2_ max _ pic _ order _ cnt _ lsb _ minus4+4) bits.
The variable PocLsbLt [ i ] [ j ] is derived as follows:
if(pic_rpl_present_flag)
PocLsbLt[i][j]=PicPocLsbLt[i][j] (142)
else
PocLsbLt[i][j]=ltrp_in_slice_header_flag[i][RplsIdx[i]]?slice_poc_lsb_lt[i][j]:rpls_poc_lsb_lt[listIdx][RplsIdx[i]][j] (143)
a slice _ delta _ poc _ msb _ present _ flag [ i ] [ j ] equal to 1 means that slice _ delta _ poc _ msb _ cycle _ lt [ i ] [ j ] is present and a slice _ delta _ poc _ msb _ present _ flag [ i ] [ j ] equal to 0 means that slice _ delta _ poc _ msb _ cycle _ lt [ i ] [ j ] is absent.
Let prevTid0Pic be the previous picture in decoding order, the previous picture having the same nuh layer id as the current picture, the Temporalld of the previous picture being equal to 0, and the previous picture not being a skip Random Access Skipped Leading (RASL) picture or a Decodable Random Access Decoded Leading (RADL) picture. Let setOfPrevPocVals be a set consisting of:
-PicOrderCntVal of prevTid0Pic,
PicOrderCntVal for each picture that is referenced by an entry in RefPicList [0] or RefPicList [1] of prevTid0Pic and has the same nuh layer id as the current picture,
-PicOrderCntVal of each picture following prevTid0Pic in decoding order, having the same nuh layer id as the current picture and preceding the current picture in decoding order.
When pic _ rpl _ present _ flag is equal to 0 and there are multiple values in setofprevpockvvals (the value modulo MaxPicOrderCntLsb is equal to PocLsbLt [ i ] [ j ]), the value of slice _ delta _ poc _ msb _ present _ flag [ i ] [ j ] may be equal to 1.
slice _ delta _ poc _ msb _ cycle _ lt [ i ] [ j ] represents the value of the variable FullPocLt [ i ] [ j ] as follows:
Figure BDA0003739688710000621
slice_delta_poc_msb_cycle_lt[i][j]can range from 0 to 2 (32 -log2_max_pic_order_cnt_lsb_minus4-4) (including beginning and endValue). If slice _ delta _ poc _ msb _ cycle _ lt [ i][j]If not, then conclude slice _ delta _ poc _ msb _ cycle _ lt [ i][j]Is equal to 0.
num _ ref _ idx _ active _ override _ flag equal to 1 indicates the presence of syntax element num _ ref _ idx _ active _ minus1[0] for P and B slices and syntax element num _ ref _ idx _ active _ minus1[1] for B slices; num _ ref _ idx _ active _ override _ flag equal to 0 means that syntax elements num _ ref _ idx _ active _ minus1[0] and num _ ref _ idx _ active _ minus1[1] are not present. When num _ ref _ idx _ active _ override _ flag does not exist, it is inferred that the value of num _ ref _ idx _ active _ override _ flag is equal to 1.
num _ ref _ idx _ active _ minus1[ i ] is used to derive the variable NumRefidxActive [ i ] shown in equation 145. num _ ref _ idx _ active _ minus1[ i ] can range from 0 to 14 inclusive.
When i is equal to 0 or 1, num _ ref _ idx _ active _ override _ flag is equal to 1 if the current slice is a B slice, and num _ ref _ idx _ active _ minus1[ i ] is not present, num _ ref _ idx _ active _ minus1[ i ] is inferred to be equal to 0.
If the current slice is a P slice, num _ ref _ idx _ active _ override _ flag is equal to 1, and num _ ref _ idx _ active _ minus1[0] is not present, then num _ ref _ idx _ active _ minus1[0] is inferred to be equal to 0.
The variable NumRefIdxActive [ i ] is derived as follows:
Figure BDA0003739688710000631
the value of NumRefIdxActive [ i ] -1 represents the maximum reference index of the reference picture list i that can be used to decode the slice. When the value of NumRefIdxActive [ i ] is equal to 0, the reference index in reference picture list i may not be used to decode the slice.
If the current slice is a P slice, the value of NumRefIdxActive [0] should be greater than 0.
If the current band is a B band, both NumRefIdxActive [0] and NumRefIdxActive [1] should be greater than 0.
Table 14: weighted prediction parameter syntax
Figure BDA0003739688710000632
Figure BDA0003739688710000641
Weighted prediction parameter semantics
luma _ log2_ weight _ denom represents the logarithm to the base 2 of the denominator of all luminance weighting factors. The value of luma _ log2_ weight _ denom can range from 0 to 7 (including the beginning and end values).
delta _ chroma _ log2_ weight _ denom represents the difference in the base 2 logarithm of the denominator of all chroma weighting factors. When delta _ chroma _ log2_ weight _ denom is not present, then delta _ chroma _ log2_ weight _ denom is inferred to be equal to 0.
The variable ChromaLog2weight denom, which is derived to be equal to luma _ log2_ weight _ denom + delta _ chroma _ log2_ weight _ denom, may range in value from 0 to 7 (including the leading and trailing values).
luma _ weight _ l0_ flag [ i ] equal to 1 indicates the presence of weighting factors for the luma components predicted from list0 by RefPicList [0] [ i ], and luma _ weight _ l0_ flag [ i ] equal to 0 indicates the absence of these weighting factors.
chroma _ weight _ l0_ flag [ i ] equal to 1 indicates the presence of weighting factors for chroma predictors for list0 prediction by RefPicList [0] [ i ], and chroma _ weight _ l0_ flag [ i ] equal to 0 indicates the absence of these weighting factors. When chroma _ weight _ l0_ flag [ i ] is not present, then chroma _ weight _ l0_ flag [ i ] is inferred to be equal to 0.
delta _ luma _ weight _ l0[ i ] represents the difference in weighting factors applied to luma predictors predicted for list0 by RefPicList [0] [ i ].
Variable LumaWeight L0[ i]Is derived to be equal to (1)<<luma_log2_weight_denom)+delta_luma_weight_l0[i]. When luma _ weight _ l0_ flag [ i [ ]]Equal to 1, delta _ luma _ weight _ l0[ i]May range from-128 to 127 (inclusive). When luma _ weight _ l0_ flag [ i [ ]]When equal to 0, then infer LumaWeight L0[ i]Is equal to 2 luma_log2_weight_denom
luma _ offset _ l0[ i ] represents an additional offset applied to the luma prediction value for list0 prediction by RefPicList [0] [ i ]. The value of luma _ offset _ l0[ i ] can range from-128 to 127 (including head and tail). When luma _ weight _ l0_ flag [ i ] is equal to 0, then luma _ offset _ l0[ i ] is inferred to be equal to 0.
delta _ chroma _ weight _ l0[ i ] [ j ] represents the difference in weighting factors applied to chroma predictors list0 prediction by RefPicList [0] [ i ], where j equals 0 for Cb and 1 for Cr.
Variable ChromaWeight L0[ i][j]Is derived to be equal to (1)<<ChromaLog2WeightDenom)+delta_chroma_weight_l0[i][j]. When chroma _ weight _ l0_ flag [ i [ ]]Equal to 1, delta _ chroma _ weight _ l0[ i][j]May range from-128 to 127 (including the beginning and end values). When chroma _ weight _ l0_ flag [ i [ ]]When equal to 0, infer ChromaWeight L0[ i][j]Is equal to 2 ChromaLog2WeightDenom
delta _ chroma _ offset _ l0[ i ] [ j ] represents the difference of the additional offset applied to the chroma predictor for list0 prediction by RefPicList [0] [ i ], where j equals 0 for Cb and 1 for Cr.
The variable ChromaOffsetl0[ i ] [ j ] is derived as follows:
ChromaOffsetL0[i][j]=Clip3(-128,127,(128+delta_chroma_offset_l0[i][j]-((128*ChromaWeightL0[i][j])>>ChromaLog2WeightDenom)))
delta _ chroma _ offset _ l0[ i ] [ j ] may range from-4 x 128 to 4 x 127 (including leading and trailing). When chroma _ weight _ l0_ flag [ i ] is equal to 0, chroma offset L0[ i ] [ j ] is inferred to be equal to 0.
luma _ weight _ L1_ flag [ i ], chroma _ weight _ L1_ flag [ i ], delta _ luma _ weight _ L1[ i ], luma _ offset _ L1[ i ], delta _ chroma _ weight _ L1[ i ] [ j ] and delta _ chroma _ offset _ L1[ i ] [ j ] have the same semantics as luma _ weight _ L0_ flag [ i ], chroma _ weight _ L0_ flag [ i ], delta _ luma _ weight _ L0[ i ], luma _ offset _ L0[ i ], delta _ chroma _ weight _ L0[ i ] [ j ] and delta _ chroma _ offset _ L0[ i ] [ j ], where L0, L38, L5848, List0 and List0 are replaced by List0, List 3, L3, I ] [ j ], respectively.
The variable sumWeightL0Flags is derived as equal to the sum of luma _ weight _ l0_ flag [ i ] +2 × chroma _ weight _ l0_ flag [ i ], where i is 0.. NumRefIdxActive [0] -1.
When slice _ type is equal to B, then the derived variable sumWeightL1Flags is equal to the sum of luma _ weight _ l1_ flag [ i ] +2 × chroma _ weight _ l1_ flag [ i ], where i is 0.. NumRefIdxActive [1] -1.
The requirement of code stream consistency is as follows: SumWeight L0Flags should be less than or equal to 24 when slice _ type is equal to P; when slice _ type is equal to B, the sum of sumWeightL0Flags and sumWeightL1Flags should be less than or equal to 24.
Reference image list structure semantics
The syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx) may exist in the SPS, or may exist in the slice header. Depending on whether the syntax structure is contained in the slice header or in the SPS, the following applies:
-if a syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx) is present in the slice header, this syntax structure represents the reference picture list listIdx of the current picture (the picture comprising the slice).
Otherwise, if a syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx) is present in the SPS, it represents a candidate for the reference picture list listIdx. The term "current image" in the semantics specified in the rest of this section refers to: (1) each picture comprising one or more slices, wherein a ref _ pic _ list _ idx [ listIdx ] of one or more slices is equal to an index in a list of syntax structures ref _ pic _ list _ struct (listIdx, rplsIdx) included in the SPS, (2) each picture in a Coded Video Sequence (CVS) of the reference SPS.
num _ ref _ entries [ listIdx ] [ rplsIdx ] represents the number of entries in the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx). num _ ref _ entries [ listIdx ] [ rplsIdx ] can range from 0 to MaxDecpicBuffMinus1+14 (including head to tail).
LTRP _ in _ slice _ header _ flag [ listIdx ] [ rplsIdx ] equal to 0 indicates that POC LSBs of Long-Term Reference Picture (LTRP) entries in the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx) are present in the syntax structure ref _ pic _ list _ struct, and LTRP _ in _ slice _ header _ flag [ listIdx ] [ rplsIdx ] equal to 1 indicates that POC LSBs of LTRP entries in the syntax structure ref _ pic _ list _ struct (listIdx, rplsx) are not present in the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx).
Inter _ Layer _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] equal to 1 means that the ith entry in the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx) is an Inter-Layer Reference Picture (ILRP) entry, and Inter _ Layer _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] equal to 0 means that the ith entry in the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx) is not an ILRP entry. When inter _ layer _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] is not present, then the value of inter _ layer _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] is inferred to be equal to 0.
st _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] equal to 1 indicates that the ith entry in the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx) is an STRP entry, and st _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] equal to 0 indicates that the ith entry in the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx) is an LTRP entry. When inter _ layer _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] is equal to 0 and st _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] is absent, then the value of st _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] is inferred to be equal to 1.
The variable NumLtrpEntries [ listIdx ] [ rplsIdx ] is derived as follows:
Figure BDA0003739688710000661
abs _ delta _ poc _ st [ listIdx ] [ rplsIdx ] [ i ] represents the value of the variable AbsDeltaPocSt [ listIdx ] [ rplsIdx ] [ i ] as follows:
Figure BDA0003739688710000662
abs_delta_poc_st[listIdx][rplsIdx][i]can range from 0 to 2 15 1 (including head-to-tail values).
strp _ entry _ sign _ flag [ listIdx ] [ rplsIdx ] [ i ] equal to 1 means that the value of the ith entry in the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx) is greater than or equal to 0, and strp _ entry _ sign _ flag [ listIdx ] [ rplsIdx ] [ i ] equal to 0 means that the value of the ith entry in the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx) is less than 0. When strp _ entry _ sign _ flag [ listIdx ] [ rplsIdx ] [ i ] is absent, then the value of strp _ entry _ sign _ flag [ listIdx ] [ rplsIdx ] [ i ] is inferred to be equal to 1.
The list DeltaPocValSt [ listIdx ] [ rplsIdx ] is derived as follows:
Figure BDA0003739688710000671
rpls _ poc _ lsb _ lt [ listIdx ] [ rplsIdx ] [ i ] represents the value modulo MaxPicOrderCntLsb by the picture order number of the picture referenced by the ith entry in the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx). The syntax element rpls _ poc _ lsb _ lt [ listIdx ] [ rplsIdx ] [ i ] has a length of (log2_ max _ pic _ order _ cnt _ lsb _ minus4+4) bits.
ILRP _ idx [ listIdx ] [ rplsIdx ] [ i ] represents the index of ILRP of the ith entry in the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx) in the list of direct reference layers. ilrp _ idx [ listIdx ] [ rplsIdx ] [ i ] can range from 0 to NumDirectRefLayers [ general LayerIdx [ nuh _ layer _ id ] ] -1 (including head and tail values).
Thus, different mechanisms may be used to control the GEO/TPM fusion mode depending on whether WP is applied to obtain the reference images of reference blocks P0 and P1 by:
-shifting the WP parameters listed in table 14 from SH into PH;
moving the GEO/TPM parameters from PH back into SH;
changing the semantics of maxnumtriangecand, i.e. setting maxnumtriangecand of these strips equal to 0 or 1 when a WP-passed reference picture can be used (e.g. at least one flag lumawightedflag equal to true).
For the TPM fusion mode, reference blocks P0 and P1 are represented in FIG. 7 as 710 and 720, respectively, for example. For the GEO-fusion mode, reference blocks P0 and P1 are denoted in fig. 8 as 810 and 820, respectively, for example.
In one embodiment, when indicating that the non-rectangular mode (e.g., GEO mode and TPM mode) is enabled and indicating the WP parameters in the image header, the following syntax may be used, as shown in the following table:
table 11: header RBSP syntax
Figure BDA0003739688710000672
Figure BDA0003739688710000681
Figure BDA0003739688710000691
……
The variable WPDisabled is set equal to 1 when the values of luma _ weight _ l0_ flag [ i ], chroma _ weight _ l0_ flag [ i ], luma _ weight _ l1_ flag [ j ], and chroma _ weight _ l1_ flag [ j ] are all set to 0, where i is 0.. NumRefIdxActive [0], j is 0.. NumRefIdxActive [1 ]; otherwise, WPDisabled is set equal to 0.
When the variable WPDisabled is set equal to 0, the value of pic _ max _ num _ merge _ cand _ minus _ max _ num _ triangle _ cand is set equal to MaxNumMergeCand.
In a second embodiment, the WP parameters are indicated in the slice header and the non-rectangular modes (e.g., GEO mode and TPM mode) are indicated as enabled. The following table shows an exemplary syntax.
Figure BDA0003739688710000692
Figure BDA0003739688710000701
Figure BDA0003739688710000711
Figure BDA0003739688710000721
Figure BDA0003739688710000731
Figure BDA0003739688710000741
……
Variable WPDisabled is set equal to 1 when the values of luma _ weight _ l0_ flag [ i ], chroma _ weight _ l0_ flag [ i ], luma _ weight _ l1_ flag [ j ], and chroma _ weight _ l1_ flag [ j ] are all set to 0, where i is 0.. NumRefIdxActive [0], j is 0.. NumRefIdxActive [1 ]; otherwise, WPDisabled is set equal to 0.
When the variable WPDisabled is set equal to 0, the value of max _ num _ merge _ cand _ minus _ max _ num _ triangle _ cand is set equal to maxnummergeand.
In the above disclosed embodiments, the weighted prediction parameters may be indicated in the picture header or slice header.
In the third embodiment, it is determined whether the TPM mode or the GEO mode is enabled while considering a reference image list for a case where a block can be used for non-rectangular weighted prediction. When the fused list of a block includes only one element in the reference picture list k, whether the fusing mode is enabled can be determined by the value of the variable WPDisabled [ k ].
In the fourth embodiment, the fused list in the non-rectangular inter prediction mode is structured so that the fused list includes only the elements for which weighted prediction is not enabled.
The following part of the description illustrates the fourth embodiment by way of example:
the inputs to the process include:
-luma position (xCb, yCb) of an upper left sample of the current luma coding block relative to an upper left luma sample of the current picture;
-a variable cbWidth, representing the width of the current coding block, in units of luma samples;
the variable cbHeight, representing the height of the current coding block, in units of luma samples.
The output of this process is as follows, where X is 0 or 1:
availability flag availableflag a of neighboring coding units 0 、availableFlagA 1 、availableFlagB 0 、availableFlagB 1 And availableFlagB 2
-reference indices refIdxLXA of neighbouring coding units 0 、refIdxLXA 1 、refIdxLXB 0 、refIdxLXB 1 And refIdxLXB 2
Prediction list usage flag predFlagLXA for neighboring coding units 0 、predFlagLXA 1 、predFlagLXB 0 、predFlagLXB 1 And predFlagLXB 2
1/16 fractional-sample-precision motion vector mvLXA of neighboring coding units 0 、mvLXA 1 、mvLXB 0 、mvLXB 1 And mvLXB 2
-half sample interpolation filter index hpeliffidxa 0 、hpelIfIdxA 1 、hpelIfIdxB 0 、hpelIfIdxB 1 And hpelIfIdxB 2
-bidirectional prediction weight index bcwIdxA 0 、bcwIdxA 1 、bcwIdxB 0 、bcwIdxB 1 And bcwIdxB 2
For availableFlagB 1 、refIdxLXB 1 、predFlagLXB 1 、mvLXB 1 、hpelIfIdxB 1 And bcwIdxB 1 The following applies:
-luminance position (xNbB) within adjacent luminance coding blocks 1 ,yNbB 1 ) Set equal to (xCb + cbWidth-1, yCb-1).
-invoking a derivation procedure of the availability of neighboring blocks specified in section 6.4.4, wherein the current luma position (xCurr, yCurr), the neighboring luma position (xNbB) set equal to (xCb, yCb) 1 ,yNbB 1 ) checkPredModey set equal to TRUE and cIdx set equal to 0 as inputs and assign the output to the block availability flag availableB 1
The variable availableFlagB 1 、refIdxLXB 1 、predFlagLXB 1 、mvLXB 1 、hpelIfIdxB 1 And bcwIdxB 1 The derivation is as follows:
if availableB 1 Equal to FALSE, then availableFlagB 1 Set equal to 0, mvLXB 1 Are set equal to 0, refIdxLXB 1 Is set equal to-1, predFlagLXB 1 Set equal to 0 (where X is 0 or 1), hpelifIdxB 1 Set equal to 0, bcwIdxB 1 Is set equal to 0.
Else, availableFlagB 1 Set equal to 1 and assign the following values:
mvLXB 1 =MvLX[xNbB 1 ][yNbB 1 ] (501)
refIdxLXB 1 =RefIdxLX[xNbB 1 ][yNbB 1 ] (502)
predFlagLXB 1 =PredFlagLX[xNbB 1 ][yNbB 1 ] (503)
hpelIfIdxB 1 =HpelIfIdx[xNbB 1 ][yNbB 1 ] (504)
bcwIdxB 1 =BcwIdx[xNbB 1 ][yNbB 1 ] (505)
for availableFlaga 1 、refIdxLXA 1 、predFlagLXA 1 、mvLXA 1 、hpelIfIdxA 1 And bcwIdxA 1 The following applies:
-luminance position (xNbA) within adjacent luminance coding blocks 1 ,yNbA 1 ) Set equal to (xCb-1, yCb + cbHeight-1).
-invoking the neighboring blocks specified in section 6.4.4A useful derivation procedure in which the current luminance position (xCurr, yCurr), the neighboring luminance position (xNbA) set equal to (xCb, yCb) 1 ,yNbA 1 ) checkPredModey set equal to TRUE and cIdx set equal to 0 as inputs and assign the output to the block availability flag availableA 1
The variable availableFlaga 1 、refIdxLXA 1 、predFlagLXA 1 、mvLXA 1 、hpelIfIdxA 1 And bcwIdxA 1 The derivation is as follows:
-availableFlaga if one or more of the following conditions is true 1 Set equal to 0, mvLXA 1 Are set equal to 0, refIdxLXA 1 Set equal to-1, predFlagLXA 1 Set equal to 0 (where X is 0 or 1), hpelIfIdxA 1 Is set equal to 0 and bcwIdxA 1 Set equal to 0:
-availableA 1 equal to FALSE;
-availableB 1 equal to TRUE, luminance position (xNbA) 1 ,yNbA 1 ) And (xNbB) 1 ,yNbB 1 ) Have the same motion vector and the same reference index;
-WPDisabledX[RefIdxLX[xNbA 1 ][yNbA 1 ]]set to 0, the blend mode is a non-rectangular blend mode (e.g., the triangle flag of the block is set equal to 1 at the current luma position (xCurr, yCurr));
-WPDisabledX[RefIdxLX[xNbB 1 ][yNbB 1 ]]set to 0, the blend mode is a non-rectangular blend mode (e.g., the triangle flag of the block is set equal to 1 at the current luma position (xCurr, yCurr)).
Else, availableFlaga 1 Set equal to 1 and assign the following values:
mvLXA 1 =MvLX[xNbA 1 ][yNbA 1 ] (506)
refIdxLXA 1 =RefIdxLX[xNbA 1 ][yNbA 1 ] (507)
predFlagLXA 1 =PredFlagLX[xNbA 1 ][yNbA 1 ] (508)
hpelIfIdxA 1 =HpelIfIdx[xNbA 1 ][yNbA 1 ] (509)
bcwIdxA 1 =BcwIdx[xNbA 1 ][yNbA 1 ] (510)
for availableFlagB 0 、refIdxLXB 0 、predFlagLXB 0 、mvLXB 0 、hpelIfIdxB 0 And bcwIdxB 0 The following applies:
-luminance position (xNbB) within adjacent luminance coding blocks 0 ,yNbB 0 ) Set equal to (xCb + cbWidth, yCb-1).
-invoking a derivation process of the availability of neighboring blocks specified in section 6.4.4, wherein the current luminance position (xCurr, yCurr), neighboring luminance position (xNbB) set equal to (xCb, yCb) 0 ,yNbB 0 ) checkPredModey set equal to TRUE and cIdx set equal to 0 as inputs and assign the output to the block availability flag availableB 0
The variable availableFlagB 0 、refIdxLXB 0 、predFlagLXB 0 、mvLXB 0 、hpelIfIdxB 0 And bcwIdxB 0 The derivation is as follows:
-availableFlagB if one or more of the following conditions is true 0 Set equal to 0, mvLXB 0 Are set equal to 0, refIdxLXB 0 Is set equal to-1, predFlagLXB 0 Set equal to 0 (where X is 0 or 1), hpelifIdxB 0 Is set equal to 0 and bcwIdxB 0 Set equal to 0:
-availableB 0 equal to FALSE;
-availableB 1 equal to TRUE, luminance position (xNbB) 1 ,yNbB 1 ) And (xNbB) 0 ,yNbB 0 ) Have the same motion vector and the same reference index;
-WPDisabledX[RefIdxLX[xNbB 0 ][yNbB 0 ]]set to 0, the blend mode is a non-rectangular blend mode (e.g., the triangle flag of the block is at the current luminanceSet equal to 1 in position (xCurr, yCurr);
-WPDisabledX[RefIdxLX[xNbB 1 ][yNbB 1 ]]set to 0 and the blend mode is a non-rectangular blend mode (e.g., the triangle flag of the block is set equal to 1 at the current luma position (xCurr, yCurr)).
Else, availableFlagB 0 Set equal to 1 and assigned the following values:
mvLXB 0 =MvLX[xNbB 0 ][yNbB 0 ] (511)
refIdxLXB 0 =RefIdxLX[xNbB 0 ][yNbB 0 ] (512)
predFlagLXB 0 =PredFlagLX[xNbB 0 ][yNbB 0 ] (513)
hpelIfIdxB 0 =HpelIfIdx[xNbB 0 ][yNbB 0 ] (514)
bcwIdxB 0 =BcwIdx[xNbB 0 ][yNbB 0 ] (515)
for availableFlaga 0 、refIdxLXA 0 、predFlagLXA 0 、mvLXA 0 、hpelIfIdxA 0 And bcwIdxA 0 The following applies:
-luminance position (xNbA) within adjacent luminance coding blocks 0 ,yNbA 0 ) Set equal to (xCb-1, yCb + cbWidth).
-invoking a derivation procedure of the availability of neighboring blocks specified in section 6.4.4, wherein the current luma position (xCurr, yCurr), the neighboring luma position (xNbA) set equal to (xCb, yCb) 0 ,yNbA 0 ) checkPredModey set equal to TRUE and cIdx set equal to 0 as inputs and assign the output to the block availability flag availableA 0
The variable availableFlaga 0 、refIdxLXA 0 、predFlagLXA 0 、mvLXA 0 、hpelIfIdxA 0 And bcwIdxA 0 The derivation is as follows:
-availableFlaga if one or more of the following conditions is true 0 Set equal to 0, mvLXA 0 Two parts ofThe quantities are all set equal to 0, refIdxLXA 0 Is set equal to-1, predFlagLXA 0 Set equal to 0 (where X is 0 or 1), hpelIfIdxA 0 Is set equal to 0 and bcwIdxA 0 Set equal to 0:
-availableA 0 equal to FALSE;
-availableA 1 equal to TRUE, luminance position (xNbA) 1 ,yNbA 1 ) And (xNbA) 0 ,yNbA 0 ) Have the same motion vector and the same reference index;
-WPDisabledX[RefIdxLX[xNbA 0 ][yNbA 0 ]]set to 0, the blend mode is a non-rectangular blend mode (e.g., the triangle flag of the block is set equal to 1 at the current luma position (xCurr, yCurr));
-WPDisabledX[RefIdxLX[xNbA 1 ][yNbA 1 ]]set to 0 and the blend mode is a non-rectangular blend mode (e.g., the triangle flag of the block is set equal to 1 at the current luma position (xCurr, yCurr)).
Else, availableFlaga 0 Set equal to 1 and assigned the following values:
mvLXA 0 =MvLX[xNbA 0 ][yNbA 0 ](516)
refIdxLXA 0 =RefIdxLX[xNbA 0 ][yNbA 0 ](517)
predFlagLXA 0 =PredFlagLX[xNbA 0 ][yNbA 0 ](518)
hpelIfIdxA 0 =HpelIfIdx[xNbA 0 ][yNbA 0 ](519)
bcwIdxA 0 =BcwIdx[xNbA 0 ][yNbA 0 ](520)
for availableFlagB 2 、refIdxLXB 2 、predFlagLXB 2 、mvLXB 2 、hpelIfIdxB 2 And bcwIdxB 2 The following applies:
-luminance position (xNbB) within adjacent luminance coding blocks 2 ,yNbB 2 ) Set equal to (xCb-1, yCb-1).
Invoking the phase specified in section 6.4.4Derivation of neighbor block availability, wherein current luma position (xCurr, yCurr), neighboring luma position (xNbB) set equal to (xCb, yCb) 2 ,yNbB 2 ) checkPredModey set equal to TRUE and cIdx set equal to 0 as inputs and assign the output to the block availability flag availableB 2
The variable availableFlagB 2 、refIdxLXB 2 、predFlagLXB 2 、mvLXB 2 、hpelIfIdxB 2 And bcwIdxB 2 The derivation is as follows:
-availableFlagB if one or more of the following conditions is true 2 Set equal to 0, mvLXB 2 Are set equal to 0, refIdxLXB 2 Is set equal to-1, predFlagLXB 2 Set equal to 0 (where X is 0 or 1), hpelIfIdxB 2 Is set equal to 0 and bcwIdxB 2 Set equal to 0:
-availableB 2 equal to FALSE;
-availableA 1 equal to TRUE, luminance position (xNbA) 1 ,yNbA 1 ) And (xNbB) 2 ,yNbB 2 ) Have the same motion vector and the same reference index;
-availableB 1 equal to TRUE, luminance position (xNbB) 1 ,yNbB 1 ) And (xNbB) 2 ,yNbB 2 ) Have the same motion vector and the same reference index;
-availableFlagA 0 +availableFlagA 1 +availableFlagB 0 +availableFlagB 1 equal to 4;
-WPDisabledX[RefIdxLX[xNbB 1 ][yNbB 1 ]]set to 0, the blend mode is a non-rectangular blend mode (e.g., the triangle flag of the block is set equal to 1 at the current luma position (xCurr, yCurr));
-WPDisabledX[RefIdxLX[xNbB 2 ][yNbB 2 ]]set to 0, the blend mode is a non-rectangular blend mode (e.g., the triangle flag of the block is set equal to 1 at the current luma position (xCurr, yCurr)).
-if not, then,availableFlagB 2 set equal to 1 and assigned the following values:
mvLXB 2 =MvLX[xNbB 2 ][yNbB 2 ] (521)
refIdxLXB 2 =RefIdxLX[xNbB 2 ][yNbB 2 ] (522)
predFlagLXB 2 =PredFlagLX[xNbB 2 ][yNbB 2 ] (523)
hpelIfIdxB 2 =HpelIfIdx[xNbB 2 ][yNbB 2 ] (524)
bcwIdxB 2 =BcwIdx[xNbB 2 ][yNbB 2 ] (525)
in the embodiments disclosed above, the following variable definitions are used:
when the values of luma _ weight _ l0_ flag [ i ] and chroma _ weight _ l0_ flag [ i ] are both set to 0, the variable WPDisabled0[ i ] is set equal to 1, where i is 0. Otherwise, the value of WPDisabled0[ i ] is set equal to 0.
When the values of luma _ weight _ l1_ flag [ i ] and chroma _ weight _ l1_ flag [ i ] are both set to 0, the variable WPDisabled1[ i ] is set equal to 1, where i is 0. Otherwise, the value of WPDisabled1[ i ] is set equal to 0.
In another embodiment, a variable slicemaxnumtrianglemagecand is defined in the slice header according to one of the following conditions:
-SliceMaxNumTriangleMergeCand=(lumaWeightedFlag||chromaWeightedFlag)?0:MaxNumTriangleMergeCand
-SliceMaxNumTriangleMergeCand=(lumaWeightedFlag||chromaWeightedFlag)?1:MaxNumTriangleMergeCand
-SliceMaxNumTriangleMergeCand=slice_weighted_pred_flag0:MaxNumTriangleMergeCand
or
-SliceMaxNumTriangleMergeCand=slice_weighted_pred_flag1:MaxNumTriangleMergeCand
The different cases listed above apply to the different embodiments.
The value of this variable is also used to resolve the fused information at the block level. The following table shows an exemplary syntax.
Figure BDA0003739688710000791
Figure BDA0003739688710000801
When the non-rectangular inter-prediction mode is the GEO mode, the following embodiments are further described as follows.
Thus, different mechanisms may be used to control the GEO/TPM fusion mode depending on whether WP is applied to obtain the reference images of reference blocks P0 and P1 by:
-moving WP parameters listed in table 14 from SH into PH;
-moving the GEO parameter from PH back into SH;
changing the semantics of MaxNumGeoMergeCand, i.e. setting MaxNumGeoMergeCand of these strips equal to 0 or 1 when the WP-passed reference picture can be used (e.g. at least one flag lumaweightflag equal to true).
For the GEO-fusion mode, reference blocks P0 and P1 are denoted in fig. 8 as 810 and 820, respectively, for example.
In one embodiment, when indicating that the non-rectangular mode (e.g., GEO mode and TPM mode) is enabled and indicating the WP parameters in the image header, the following syntax may be used, as shown in the following table:
table 11: picture header RBSP syntax
Figure BDA0003739688710000802
Figure BDA0003739688710000811
Figure BDA0003739688710000821
……
Variable WPDisabled is set equal to 1 when the values of luma _ weight _ l0_ flag [ i ], chroma _ weight _ l0_ flag [ i ], luma _ weight _ l1_ flag [ j ], and chroma _ weight _ l1_ flag [ j ] are all set to 0, where i is 0.. NumRefIdxActive [0], j is 0.. NumRefIdxActive [1 ]; otherwise, WPDisabled is set equal to 0.
When the variable WPDisabled is set equal to 0, the value of pic _ max _ num _ merge _ cand _ minus _ max _ num _ geo _ cand is set equal to MaxNumMergeCand.
In another embodiment, pic _ max _ num _ merge _ cand _ minus _ max _ num _ geo _ cand is set equal to MaxUMMergeCand-1.
In a second embodiment, the WP parameters are indicated in the slice header and the non-rectangular mode (e.g., GEO mode and TPM mode) is indicated as enabled. The following table shows an exemplary syntax.
Figure BDA0003739688710000822
Figure BDA0003739688710000831
Figure BDA0003739688710000841
Figure BDA0003739688710000851
Figure BDA0003739688710000861
Variable WPDisabled is set equal to 1 when the values of luma _ weight _ l0_ flag [ i ], chroma _ weight _ l0_ flag [ i ], luma _ weight _ l1_ flag [ j ], and chroma _ weight _ l1_ flag [ j ] are all set to 0, where i is 0.. NumRefIdxActive [0], j is 0.. NumRefIdxActive [1 ]; otherwise, WPDisabled is set equal to 0.
When the variable WPDisabled is set equal to 0, the value of max _ num _ merge _ cand _ minus _ max _ num _ geo _ cand is set equal to MaxNumMergeCand.
In another embodiment, when variable WPDisabled is set equal to 0, max _ num _ merge _ cand _ minus _ max _ num _ geo _ cand is set equal to MaxUMMergeCand-1.
In the above disclosed embodiments, the weighted prediction parameters may be indicated in the picture header or slice header.
In other embodiments, the variable SliceMaxNumGeoMergeCand is defined in the slice header according to one of the following conditions:
-SliceMaxNumGeoMergeCand=(lumaWeightedFlag||chromaWeightedFlag)?0:MaxNumGeoMergeCand
-SliceMaxNumGeoMergeCand=(lumaWeightedFlag||chromaWeightedFlag)?1:MaxNumGeoMergeCand
-SliceMaxumGeoMergeCand slice weighted pred-flag 0 MaxNumGeoMergeCand or
-SliceMaxNumGeoMergeCand=slice_weighted_pred_flag1:MaxNumGeoMergeCand
The different cases listed above apply to the different embodiments.
The value of this variable is also used to resolve the fused information at the block level. The following table shows an exemplary syntax.
7.3.9.7 fusion data grammar
Figure BDA0003739688710000862
Figure BDA0003739688710000871
Figure BDA0003739688710000881
The related image header semantics are as follows:
pic _ max _ num _ merge _ cand _ minus _ max _ num _ geo _ cand represents the maximum number of geo-merge mode candidates supported in a slice associated with a picture header subtracted from MaxNumMergeCand.
When pic _ max _ num _ merge _ scan _ minus _ max _ num _ geo _ scan is not present, sps _ geo _ enabled _ flag is equal to 1, and maxnummergeecd is greater than or equal to 2, then pic _ max _ num _ merge _ scan _ minus _ max _ num _ geo _ scan is equal to pps _ max _ num _ merge _ scan _ minus _ max _ num _ geo _ scan _ plus1-1 is inferred.
The maximum number of geo-fusion pattern candidates MaxNumGeoMergeCand is derived as follows:
MaxNumGeoMergeCand=MaxNumMergeCand-pic_max_num_merge_cand_minus_max_num_geo_cand(87)
when pic _ max _ num _ merge _ cand _ minus _ max _ num _ gem _ geo _ cand is present, maxnumgeorgecand may range in value from 2 to maxnummagecand (inclusive).
When pic _ max _ num _ merge _ cand _ minus _ max _ num _ geo _ cand is not present, and sps _ geo _ enabled _ flag is equal to 0, or maxnumgeorgecand is less than 2, maxnumgeorgecand is set equal to 0.
When MaxNumGeoMergeCand is equal to 0, the geo-fusion mode is not allowed to be used for PH-associated stripes.
Several indication-related aspects are considered in the following embodiments. These aspects are as follows:
-syntax element () related to the number of fusion mode candidates is indicated in the Sequence Parameter Set (SPS), which makes it possible to derive the number of non-rectangular fusion mode candidates (maxnumgeomemegancecand) at SPS level by a specific implementation;
when the image comprises only one slice, PH can be indicated in SH;
-defining a PH/SH parameter override mechanism comprising:
■ the PPS flag indicates whether the syntax element of the relevant coding tool is present in PH or SH (cannot be present in both PH and SH);
■ in particular, the reference picture list and the weighted prediction table may use this mechanism;
prediction weight table, i.e. the fifth class of data that can be indicated in PH or SH (such as ALF, deblocking filtering, RPL and SAO);
-when weighted prediction is enabled for a picture, all slices in the picture are required to have the same reference picture list;
-if only certain slice types are used in pictures associated with PH, the inter and intra related syntax elements will be conditionally indicated.
■ specifically, two flags are introduced: pic _ inter _ slice _ present _ flag and pic _ intra _ slice _ present _ flag.
In embodiment 1, syntax element () related to the number of fusion mode candidates is indicated in a Sequence Parameter Set (SPS), which makes it possible to derive the number of non-rectangular fusion mode candidates (MaxNumGeoMergeCand) at the SPS level by a specific implementation. This aspect may be implemented by an encoding process or a decoding process according to the following syntax:
7.3.2.3 sequence parameter set RBSP syntax
Figure BDA0003739688710000891
Figure BDA0003739688710000901
The syntax described above has the following semantics:
sps _ six _ minus _ max _ num _ merge _ scan _ plus1 equal to 0 means pic _ six _ minus _ max _ num _ merge _ scan present in the PH of the reference PPS, and sps _ six _ minus _ max _ num _ merge _ scan _ plus1 greater than 0 means pic _ six _ minus _ max _ num _ merge _ scan not present in the PH of the reference PPS. The value of sps _ six _ minus _ max _ num _ merge _ cand _ plus1 can range from 0 to 6 (including beginning and end values).
sps _ max _ num _ merge _ scan _ minus _ max _ num _ geo _ scan _ plus1 equal to 0 means that pic _ max _ num _ merge _ scan _ minus _ max _ num _ geo _ scan is present in the PH of the slice of the reference PPS, and sps _ max _ num _ merge _ scan _ minus _ max _ num _ geo _ scan _ plus1 greater than 0 means that pic _ max _ num _ merge _ scan _ minus _ max _ num _ geo _ scan is not present in the PH of the reference PPS. The value of sps _ max _ num _ merge _ cand _ minus _ max _ num _ geo _ cand _ plus1 can range from 0 to maxnumMergeCand-1.
The semantics of the corresponding elements in PH are as follows:
pic _ six _ minus _ max _ num _ merge _ cand represents the maximum number of Motion Vector Prediction (MVP) candidates supported in the slice associated with PH subtracted from 6. The maximum number of fused MVP candidates MaxNumMergeCand is derived as follows:
MaxNumMergeCand=6-pic_six_minus_max_num_merge_cand(85)
MaxNumMergeCand may range from 1 to 6 (inclusive). When pic _ six _ minus _ max _ num _ merge _ cand is not present, then the value of pic _ six _ minus _ max _ num _ merge _ cand is inferred to be equal to sps _ six _ minus _ max _ num _ merge _ rise _ plus 1-1.
pic _ max _ num _ merge _ cand _ minus _ max _ num _ geo _ cand represents the maximum number of geo-merge mode candidates supported in a slice associated with the picture header subtracted from maxnummergeand.
When sps _ max _ num _ merge _ scan _ minus _ max _ num _ geo _ scan is not present, sps _ geo _ enabled _ flag is equal to 1, and maxnummergeecd is greater than or equal to 2, then pic _ max _ num _ merge _ scan _ minus _ max _ num _ geo _ scan is inferred to be equal to sps _ max _ num _ merge _ scan _ minus _ max _ num _ geo _ scan _ plus 1-1.
The maximum number of geo-fusion pattern candidates MaxNumGeoMergeCand is derived as follows:
MaxNumGeoMergeCand=MaxNumMergeCand-pic_max_num_merge_cand_minus_max_num_geo_cand(87)
when pic _ max _ num _ merge _ cand _ minus _ max _ num _ geo _ cand is present, maxnumgeorgecand can range from 2 to maxnummemegacand (inclusive).
When pic _ max _ num _ merge _ cand _ minus _ max _ num _ geo _ cand is not present, and sps _ geo _ enabled _ flag is equal to 0, or maxnumgeorgecand is less than 2, maxnumgeorgecand is set equal to 0.
When MaxNumGeoMergeCand is equal to 0, the geo-fusion mode is not allowed to be used for PH-associated stripes.
The optional syntax and semantics of this embodiment are as follows:
Figure BDA0003739688710000911
sps _ six _ minus _ max _ num _ merge _ cand denotes the maximum number of Motion Vector Prediction (MVP) candidates supported in a slice associated with PH subtracted from 6. The maximum number of fused MVP candidates MaxNumMergeCand is derived as follows:
MaxNumMergeCand=6-sps_six_minus_max_num_merge_cand (85)
MaxNumMergeCand may range from 1 to 6 (inclusive).
sps _ max _ num _ merge _ cand _ minus _ max _ num _ geo _ cand represents the maximum number of geo-merge mode candidates supported in a slice associated with a picture header subtracted from MaxNumMergeCand.
The maximum number of geo-fusion mode candidates MaxNumGeoMergeCand is derived as follows:
MaxNumGeoMergeCand=MaxNumMergeCand-sps_max_num_merge_cand_minus_max_num_geo_cand (87)
when sps _ max _ num _ merge _ cand _ minus _ max _ num _ gem _ geo _ cand is present, MaxNumGeoMergeCand can range in value from 2 to MaxNumMergeCand (inclusive).
When sps _ max _ num _ merge _ cand _ minus _ max _ num _ geo _ cand is not present, and ps _ geo _ enabled _ flag is equal to 0, or maxnumgeorgecand is less than 2, maxnumgeorgecand is set equal to 0.
When MaxNumGeoMergeCand is equal to 0, the geo fusion mode is not used.
For the above embodiment and the two alternative syntax definitions, it is checked whether weighted prediction is enabled. This check may affect the derivation of the variable MaxNumGeoMergeCand, and the value of MaxNumGeoMergeCand is set to 0 in one of the following cases:
-when i 0.. NumRefIdxActive [0], j 0.. NumRefIdxActive [1], the values of luma _ weight _ l0_ flag [ i ], chroma _ weight _ l0_ flag [ i ], luma _ weight _ l1_ flag [ j ], and chroma _ weight _ l1_ flag [ j ] are all set to 0 or absent;
-when a flag in the SPS or PPS indicates the presence of bi-directional weighted prediction (PPS _ weighted _ bipred _ flag);
-when the presence of bi-directional weighted prediction is indicated in Picture Header (PH) or Slice Header (SH).
The SPS level flag indicating the presence of weighted prediction parameters may indicate as follows:
Figure BDA0003739688710000921
the syntax element "sps _ wp _ enabled _ flag" determines whether weighted prediction can be enabled at a lower level (PPS, PH or SH level). Exemplary implementations are described below:
Figure BDA0003739688710000922
Figure BDA0003739688710000931
in the above table, pps _ weighted _ pred _ flag and pps _ weighted _ bipred _ flag are flags in the bitstream, indicating whether weighted prediction is enabled for the unidirectional prediction block and the bidirectional prediction block, respectively.
In one embodiment, when the weighted prediction flags are designated as pic _ weighted _ pred _ flag and pic _ weighted _ bipred _ flag, etc. in the picture header, the following dependencies on the sps _ wp _ enabled _ flag may be specified in the bitstream syntax:
...
if(sps_wp_enabled_flag){
pic_weighted_pred_flag
pic_weighted_bipred_flag
}
...
in one embodiment, when the weighted prediction flags are specified in the slice header as weighted _ pred _ flag and weighted _ bipred _ flag, etc., the following dependencies on the sps _ wp _ enabled _ flag may be specified in the stream syntax:
...
if(sps_wp_enabled_flag){
weighted_pred_flag
weighted_bipred_flag
}
...
in embodiment 2, the reference picture list may be indicated in PPS or in PH or SH (but not both PH and SH). As can be appreciated from the state of the art, the indication of the reference picture list depends on syntax elements (e.g., pps _ weighted _ pred _ flag and pps _ weighted _ bipred _ flag) that indicate the presence of weighted prediction. Thus, depending on whether the reference picture list is indicated in the PPS, PH or SH, the weighted prediction parameters are indicated before the reference picture list is indicated in the corresponding PPS, PH or SH.
For the present embodiment, the following syntax may be specified:
picture parameter set syntax
Figure BDA0003739688710000932
Figure BDA0003739688710000941
……
rpl _ present _ in _ PH _ flag equal to 1 means that no reference picture list is indicated in the slice header of the reference PPS, but possibly in the PH of the reference PPS; the rpl _ present _ in _ PH _ flag equal to 0 means that the reference picture list is not indicated in the PH of the reference PPS, but may be indicated in the slice header of the reference PPS.
SAO _ present _ in _ PH _ flag equal to 1 means that a syntax element for turning on use of SAO is not present in a slice header of the reference PPS but may be present in a PH of the reference PPS; SAO _ present _ in _ PH _ flag equal to 0 means that a syntax element for turning on the use of SAO does not exist in the PH of the reference PPS but may exist in the slice header of the reference PPS.
ALF _ present _ in _ PH _ flag equal to 1 means that a syntax element for turning on the use of ALF does not exist in the slice header of the reference PPS but may exist in the PH of the reference PPS; ALF _ present _ in _ PH _ flag equal to 0 means that a syntax element for turning on the use of ALF is not present in the PH of the reference PPS but may be present in the slice header of the reference PPS.
……
weighted _ pred _ table _ present _ in _ PH _ flag equal to 1 means that the weighted prediction table is not present in the slice header of the reference PPS, but may be present in the PH of the reference PPS; weighted _ pred _ table _ present _ in _ PH _ flag equal to 0 means that the weighted prediction table does not exist in the PH of the reference PPS but may exist in the slice header of the reference PPS. When weighted _ pred _ table _ present _ in _ ph _ flag is not present, then the value of weighted _ pred _ table _ present _ in _ ph _ flag is inferred to be equal to 0.
……
deblocking _ filter _ overridden _ enabled _ flag equal to 1 indicates that the coverage of the deblocking filter may exist in the PH or slice header of the reference PPS, and deblocking _ filter _ overridden _ enabled _ flag equal to 0 indicates that the coverage of the deblocking filter does not exist in the PH or slice header of the reference PPS. When the deblocking _ filter _ override _ enabled _ flag does not exist, it is inferred that the value of the deblocking _ filter _ override _ enabled _ flag is equal to 0.
deblocking _ filter _ override _ present _ in _ PH _ flag equal to 1 means that the coverage of the deblocking filter is not present in the slice header of the reference PPS, but may be present in the PH of the reference PPS; deblocking _ filter _ overlap _ present _ in _ PH _ flag equal to 0 means that the coverage of the deblocking filter does not exist in the PH of the reference PPS, but may exist in the slice header of the reference PPS.
……
Figure BDA0003739688710000951
Figure BDA0003739688710000961
Figure BDA0003739688710000971
Figure BDA0003739688710000972
Figure BDA0003739688710000981
Figure BDA0003739688710000991
An alternative syntax for the picture header is as follows:
Figure BDA0003739688710000992
Figure BDA0003739688710001001
Figure BDA0003739688710001011
in embodiment 3, the indications of the picture header and slice header elements may be combined in a single process.
A flag indicating whether to combine a picture header and a slice header ("picture _ header _ in _ slice _ header _ flag") is introduced in the present embodiment. According to this embodiment, the syntax of the codestream is as follows:
picture header RBSP syntax
Figure BDA0003739688710001012
Syntax of picture header structure
Figure BDA0003739688710001013
Universal stripe header grammar
Figure BDA0003739688710001014
Figure BDA0003739688710001021
The semantics of picture _ header _ in _ slice _ header _ flag and the associated code stream constraints are as follows:
picture _ header _ in _ slice _ header _ flag equal to 1 indicates that a picture header syntax structure exists in the slice header, and picture _ header _ in _ slice _ header _ flag equal to 0 indicates that a picture header syntax structure does not exist in the slice header.
The requirement of code stream consistency is as follows: the value of picture _ header _ in _ slice _ header _ flag is the same in all slices of the CLVS.
When picture _ header _ in _ slice _ header _ flag is equal to 1, the requirement of the code stream consistency is as follows: NAL units of NAL unit type equal to PH _ NUT are not present in CLVS.
When picture _ header _ in _ slice _ header _ flag is equal to 0, the requirement of the code stream consistency is as follows: NAL units of NAL unit type equal to PH _ NUT are present in a PU, which NAL unit precedes the first VCL NAL unit of the PU.
The combination of example 2 and example 3 is also very important, since both of these examples relate to PH indication and SH indication.
The combination of the various aspects of these embodiments is as follows:
when picture _ header _ in _ slice _ header _ flag is equal to 0 (embodiment 4), these flags indicate whether syntax elements of the relevant coding tool are present in PH or SH (but cannot be present in both PH and SH).
Otherwise, when picture _ header _ in _ slice _ header _ flag is equal to 1, these flags are inferred to be 0, indicating that the indication of tool parameters is made at the slice level.
An alternative combination is as follows:
when picture _ header _ in _ slice _ header _ flag is equal to 0 (embodiment 4), these flags indicate whether syntax elements of the relevant coding tool are present in PH or SH (but cannot be present in both PH and SH).
Otherwise, when picture _ header _ in _ slice _ header _ flag is equal to 1, these flags are inferred to be 0, indicating that the indication of the tool parameters is made at the picture header level.
The combination has the following syntax:
picture parameter set syntax
Figure BDA0003739688710001031
Figure BDA0003739688710001041
In embodiment 4, whether or not weighted prediction is enabled is checked by indicating the number of entries referred to in weighted prediction in the reference picture list.
The syntax and semantics in this embodiment are defined as follows:
Figure BDA0003739688710001042
Figure BDA0003739688710001051
……
num _ l0_ weighted _ ref _ pics represents the number of reference pictures in the reference picture list0 that are weighted. num _ l0_ weighted _ ref _ pics can range from 0 to MaxDecPicBuffMinus1+14 (including head to tail).
The requirement of code stream consistency is as follows: when num _ L0_ weighted _ ref _ pics is present, the value of num _ L0_ weighted _ ref _ pics may not be less than the number of active reference pictures of L0 for any slice in the picture associated with the picture header.
num _ l1_ weighted _ ref _ pics represents the number of reference pictures in reference picture list1 to be weighted. num _ l1_ weighted _ ref _ pics can range from 0 to MaxDecPicBuffMinus1+14 (including head to tail).
The requirement of code stream consistency is as follows: when num _ L1_ weighted _ ref _ pics is present, the value of num _ L1_ weighted _ ref _ pics may not be less than the number of active reference pictures of L1 for any slice in the picture associated with the picture header.
……
MaxNumGeoMergeCand is set to 0 when num _ l0_ weighted _ ref _ pics or num _ l1_ weighted _ ref _ pics are not 0. The following syntax is an example of how this dependency can be exploited:
Figure BDA0003739688710001052
the semantics of pic _ max _ num _ merge _ cand _ minus _ max _ num _ geo _ cand in the present embodiment are the same as those in the above-described embodiment.
In embodiment 5, inter and intra related syntax elements will be conditionally indicated if only certain slice types are used in pictures associated with the PH.
The syntax of this embodiment is detailed below:
Figure BDA0003739688710001061
Figure BDA0003739688710001071
Figure BDA0003739688710001081
Figure BDA0003739688710001091
7.3.7.1 general purpose Bandwidth header grammar
Figure BDA0003739688710001092
Figure BDA0003739688710001101
7.4.3.6 header RBSP semantics
pic _ inter _ slice _ present _ flag equal to 1 means that one or more slices having slice _ type equal to 0(B) or 1(P) may exist in the picture associated with the PH, and pic _ inter _ slice _ present _ flag equal to 0 means that slices having slice _ type equal to 0(B) or 1(P) may not exist in the picture associated with the PH.
pic _ intra _ slice _ present _ flag equal to 1 means that one or more slices having slice _ type equal to 2(I) may exist in the picture associated with the PH, and pic _ intra _ slice _ present _ flag equal to 0 means that slices having slice _ type equal to 2(I) may not exist in the picture associated with the PH. When pic _ intra _ slice _ only _ flag is not present, then the value of pic _ intra _ slice _ only _ flag is inferred to be equal to 1.
Description of the drawings: in a picture header associated with a picture comprising one or more sub-pictures, wherein the one or more sub-pictures comprise one or more intra-coded slices, and the one or more intra-coded slices are fusible with one or more sub-pictures comprising the one or more inter-coded slices, the value of pic _ inter _ slice _ present _ flag and pic _ intra _ slice _ present _ flag are both set equal to 1.
7.4.8.1 Universal Bandwidth header semantics
slice _ type specifies the coding type of the slice, as shown in tables 7-5.
Tables 7 to 5: association relation between name and slice _ type
slice_type Name of slice _ type
0 B (B strip)
1 P (P strip)
2 I (I strip)
When the value of nal _ unit _ type ranges from IDR _ W _ RADL to CRA _ NUT (including the leading and trailing values), and the current picture is the first picture in the access unit, slice _ type should be equal to 2.
When slice _ type does not exist, it is inferred that the value of slice _ type is equal to 2.
When pic _ intra _ slice _ present _ flag is equal to 0, slice _ type may range in value from 0 to 1 (including the leading and trailing values).
Embodiment 4 can be used in combination with an indication of pred _ weight _ table () in the picture header. The above embodiment discloses that pred _ weight _ table () is indicated in the picture header.
An exemplary syntax is as follows:
Figure BDA0003739688710001111
when pred _ weight _ table () is indicated to be present in the picture header, the following syntax can be used in combination with embodiment 3:
Figure BDA0003739688710001112
the following syntax may be used in some alternative embodiments:
Figure BDA0003739688710001113
Figure BDA0003739688710001121
the following syntax may be used in some alternative embodiments:
Figure BDA0003739688710001122
in the above syntax, pic _ inter _ bipred _ slice _ present _ flag indicates that all slice types of reference picture header exist: i lane, B lane, and P lane.
When pic _ inter _ bipred _ slice _ present _ flag is 0, a picture includes only an I type slice or a B type slice.
In this case, the non-rectangular mode is not enabled.
The features disclosed above are combined in embodiment 5. An exemplary syntax is as follows:
Figure BDA0003739688710001123
Figure BDA0003739688710001131
Figure BDA0003739688710001141
in embodiment 6, the encoder is allowed to select a non-rectangular (e.g., GEO) mode of the referenced picture without using a weighted predictor.
In the present embodiment, semantics are defined as follows:
7.4.10.7 fusion data semantics
……
The variable MergeGeoFlag [ x0] [ y0] representing whether geo-shape based motion compensation is used to generate the prediction samples for the current coding unit is derived as follows when decoding B slices:
-MergeGeoFlag [ x0] [ y0] is set equal to 1 if all of the following conditions are true:
-sps _ geo _ enabled _ flag is equal to 1;
-slice _ type is equal to B;
-general _ merge _ flag [ x0] [ y0] equal to 1;
-MaxNumGeoMergeCand greater than or equal to 2;
-cbWidth greater than or equal to 8;
-cbHeight greater than or equal to 8;
-cbWidth is less than 8 × cbHeight;
-cbHeight less than 8 × cbWidth;
-regular _ merge _ flag x0 y0 equal to 0;
-merge _ sublock _ flag [ x0] [ y0] is equal to 0;
-ciip _ flag [ x0] [ y0] is equal to 0.
-otherwise, MergeGeoFlag [ x0] [ y0] is set equal to 0.
The requirement of code stream consistency is as follows: MergeGeoFlag [ x0] [ y0] may be equal to 0 if one of the luma explicit weighting flag or the chroma explicit weighting flag of the CU is true.
Example 7 can be explained as part of the VVC specification as follows:
8.5.7 Geo inter Block decoding Process
8.5.7.1 overview
This procedure is invoked when decoding a coding unit with MergeGeoFlag [ xCb ] [ yCb ] equal to 1.
The inputs to the process include:
-a luma position (xCb, yCb) representing the position of the top left luma sample of the current coding block relative to the top left luma sample of the current picture;
-a variable cbWidth, representing the width of the current coding block, in units of luma samples;
-a variable cbHeight, representing the height of the current coding block, in units of luma samples;
-1/16 fractional-sample-precision luma motion vectors mvA and mvB;
-chrominance motion vectors mvCA and mvCB;
-reference indices refIdxA and refIdxB;
prediction list flags predlistflag a and predlistflag b.
……
Suppose predSamplesLA L And predSamplesLB L To predict the (cbWidth) x (cbHeight) array of luma sample values, predSamplesLA Cb 、predSamplesLB Cb 、predSamplesLA Cr And predSamplesLB Cr (cbWidth/SubWidthC) x (cbHeight/Subheight C) array of values for prediction of chroma samples.
predSamples L 、predSamples Cb And predSamples Cr The following steps are derived by performing the following steps in order:
1. when N is a or B, respectively, the following applies:
……
2. variables angleIdx and distanceIdx indicating the division angle and distance in the geo-fusion mode are set according to the values of merge _ geo _ partition _ idx [ xCb ] [ yCb ] specified in table 36.
3. The variable explictWeightedFlag is derived as follows:
lumaWeightedFlagA =predListFlagAluma_weight_l1_flag[refIdxA]:luma_weight_l0_flag[refIdxA]
lumaWeightedFlagB=predListFlagBluma_weight_l1_flag[refIdxB]:luma_weight_l0_flag[refIdxB]
chromaWeightedFlagA =predListFlagAchroma_weight_l1_flag[refIdxA]:chroma_weight_l0_flag[refIdxA]
chromaWeightedFlagB=predListFlagBchroma_weight_l1_flag[refIdxB]:chroma_weight_l0_flag[refIdxB]
weightedFlag=lumaWeightedFlagA||lumaWeightedFlagB||chromaWeightedFlagA||chromaWeightedFlagB
4. prediction samples predSamples within a current luma coded block L [x L ][y L ](wherein, x L =0..cbWidth-1,y L cbHeight-1) was derived by: if the weightedFlag is equal to 0, then the weighted sample prediction process for the geo-fusion mode specified in section 8.5.7.2 is invoked; such asIf the weightedFlag is equal to 1, then the explicit weighted sample prediction procedure defined in section 8.5.6.6.3 is invoked, where the coding block width nCbW is set equal to cbWidth, the coding block height nCbH is set equal to cbHeight, the sample array predSamplesLA L And predSamplesLB L The variables angleIdx and distanceIdx, and cIdx set equal to 0 are inputs.
5. Prediction samples predSamples within a current chroma component Cb coding block Cb [x C ][y C ](wherein, x C =0..cbWidth/SubWidthC-1,y C cbHeight/subheight c-1) is derived by: if the weightedFlag is equal to 0, then the weighted sample prediction process for the geo-fusion mode specified in section 8.5.7.2 is invoked; if the weightedFlag is equal to 1, then the explicit weighted sample prediction procedure defined in section 8.5.6.6.3 is invoked, where the coding block width nCbW is set equal to cbWidth/SubWidtchC, the coding block height nCbH is set equal to cbHeight/SubHeight C, the sample array predSamplesLA Cb And predSamplesLB Cb The variables angleIdx and distanceIdx, and cIdx set equal to 1 are inputs.
6. Prediction samples predSamples within a current chroma component Cr coded block Cr [x C ][y C ](wherein, x C =0..cbWidth/SubWidthC-1,y C cbHeight/subheight c-1) is derived by: if the weightedFlag is equal to 0, then the weighted sample prediction process for the geo-fusion mode defined in section 8.5.7.2 is invoked; if the weightedFlag is equal to 1, then the explicit weighted sample prediction process defined in section 8.5.6.6.3 is invoked, where the coding block width nCbW is set equal to cbWidth/SubWidthC, the coding block height nCbH is set equal to cbHeight/SubHeight C, the sample array predSamplesLA Cr And predSamplesLB Cr The variables angleIdx and distanceIdx, and cIdx set equal to 2 are inputs.
7. The motion vector storage procedure of the geo fusion mode specified in section 8.5.7.3 is invoked, in which the luma coding block position (xCb, yCb), the luma coding block width cbWidth, the luma coding block height cbHeight, the partition direction variables anglexdx and distanceIdx, the luma motion vectors mvA and mvB, the reference indices refIdxA and refIdxB, and the prediction list flags predlistflag a and predlistflag b are input.
Table 36: specification of angleIdx and distanceIdx values based on merge _ geo _ partition _ idx values
Figure BDA0003739688710001161
Figure BDA0003739688710001171
8.5.6.6.3 explicit weighted sample prediction process
The inputs to the process include:
two variables nCbW and nCbH, representing the width and height of the current coding block;
two (nCbW) x (nCbH) arrays predSamplesL0 and predSamplesL 1;
prediction list use flags predflag l0 and predflag l 1;
reference indices refIdxL0 and refIdxL 1;
-a variable cIdx representing a color component index;
-a sample bit depth bitDepth.
The output of this process includes an (nCbW) × (nCbH) array pbSamples composed of predicted sample values.
The variable shift1 is set equal to Max (2, 14-bitDepth).
The variables log2Wd, o0, o1, w0 and w1 are derived as follows:
if the cIdx of the luminance sample is equal to 0, then the following applies:
log2Wd=luma_log2_weight_denom+shift1 (1010)
w0=LumaWeightL0[refIdxL0] (1011)
w1=LumaWeightL1[refIdxL1] (1012)
o0=luma_offset_l0[refIdxL0]<<(bitDepth-8) (1013)
o1=luma_offset_l1[refIdxL1]<<(bitDepth-8) (1014)
otherwise, if the cIdx of the chroma samples is not equal to 0, the following applies:
log2Wd=ChromaLog2WeightDenom+shift1 (1015)
w0=ChromaWeightL0[refIdxL0][cIdx-1] (1016)
w1=ChromaWeightL1[refIdxL1][cIdx-1] (1017)
o0=ChromaOffsetL0[refIdxL0][cIdx-1]<<(bitDepth-8) (1018)
o1=ChromaOffsetL1[refIdxL1][cIdx-1]<<(bitDepth-8) (1019)
the prediction sample pbSamples [ x ] [ y ] is derived as follows, where x ═ 0.. nCbW-1, y ═ 0.. nCbH-1:
if predflag l0 is equal to 1 and predflag l1 is equal to 0, then the prediction sample value is derived as follows:
Figure BDA0003739688710001181
otherwise, if predflag l0 is equal to 0 and predflag l1 is equal to 1, then the prediction sample value is derived as follows:
Figure BDA0003739688710001182
otherwise, if predflag l0 equals 1 and predflag l1 equals 1, then the prediction sample values are derived as follows:
pbSamples[x][y]=Clip3(0,(1<<bitDepth)-1,(predSamplesL0[x][y]*w0+predSamplesL1[x][y]*w1+((o0+o1+1)<<log2Wd))>>(log2Wd+1)) (1022)
the syntax for fusing data parameters is disclosed in example 8, which includes checking variables for indicating the presence of a non-rectangular fusion mode (e.g., GEO mode). Examples of grammars are as follows:
Figure BDA0003739688710001183
Figure BDA0003739688710001191
the variable MaxNumGeoMergeCand is derived according to any of the embodiments described above.
An optional variable slicemaxnumgeomergecatd, which is derived from the variable maxnumgeomergecatd, is used in example 9. The value of MaxNumGeoMergeCand is obtained at the higher indicated level (i.e. PH, PPS or SPS level).
In this embodiment, SliceMaxNumGeoMergeCand is derived from the value of MaxNumGeoMergeCand and other checks performed on the stripe.
For example, slecemaxumgeorgecand ═ (num _ l0_ weighted _ ref _ pics >0| messaging
num_l1_weighted_ref_pics>0)?0:MaxNumGeoMergeCand。
Embodiment 10 is a variation of embodiment 9 in which the following expression is used to determine the value of MaxNumGeoMergeCand:
SliceMaxNumGeoMergeCand=(!pic_inter_slice_present_flag)?0:MaxNumGeoMergeCand
example 11 was used in combination with example 4.
The syntax table is defined as follows:
Figure BDA0003739688710001201
the variable MaxNumGeoMergeCand is derived as follows:
SliceMaxNumGeoMergeCand=(!pic_inter_bipred_slice_present_flag)?0:MaxNumGeoMergeCand
in accordance with the foregoing description, the following examples are provided herein, in particular.
Fig. 15 is a flowchart of a decoding method according to an embodiment of the present invention.
The method shown in fig. 15 includes: step 1510: a codestream of a current image (e.g., an encoded video sequence) is obtained. Step 1520: obtaining (e.g., by parsing a corresponding syntax element included in the codestream) a value of a first indicator of the current picture from the codestream, wherein the first indicator indicates a slice type (of a slice of the current picture). Further, the method comprises: step 1530: obtaining (e.g., by parsing a corresponding syntax element included in the codestream) a value of a second indicator of the current picture from the codestream, wherein the second indicator indicates whether a weighted prediction parameter is present in a picture header or a slice header of the codestream.
Then, step 1540: when the value of the first indicator is equal to a first preset value (e.g., 1) and the value of the second indicator is equal to a second preset value (e.g., 2) (the second preset value is used for indicating that a weighted prediction parameter exists in a picture header or a slice header of the code stream), resolving the value of the weighted prediction parameter of a current block from the code stream, wherein the current block is contained in a current slice of the current picture, the first preset value is an integer value, and the second preset value is an integer value. Then, step 1550: and predicting the current block according to the analyzed value of the weighted prediction parameter (namely generating a prediction block for the current block).
Fig. 16a and 16b illustrate a decoding method according to another embodiment of the present invention.
The method comprises the following steps: step 1610: a codestream of a current image (e.g., an encoded video sequence) is obtained. Obtaining values for a plurality of indicators: thus, the method comprises: step 1620: obtaining (e.g., by parsing a corresponding syntax element included in the codestream) a value of a first indicator of the current picture from the codestream, wherein the first indicator indicates a slice type (of a slice of the current picture); step 1630: obtaining (e.g., by parsing a corresponding syntax element included in the codestream) a value of a second indicator of the current picture from the codestream, wherein the second indicator indicates whether a weighted prediction parameter is present in a picture header or a slice header of the codestream; step 1640: obtaining (e.g., by parsing a corresponding syntax element included in the code stream) a value of a third indicator of the current picture according to the code stream, wherein the third indicator indicates whether weighted prediction is applicable to an inter-slice whose slice type is a B-slice or a P-slice.
Then, step 1650: resolving a value of a weighted prediction parameter of a current block from the code stream when (a) a value of the first indicator is equal to a first preset value (e.g., 1), (b) a value of the second indicator is equal to a second preset value (e.g., 1), and (c) a value of the third indicator indicates that weighted prediction is applicable to the inter slice, wherein the current block is included in a current slice of the current picture, the first preset value is an integer value, and the second preset value is an integer value. Then, step 1660: and predicting the current block according to the analyzed value of the weighted prediction parameter (namely generating a prediction block for the current block).
The above method may be implanted in a decoding device as described below.
There is provided a decoding apparatus 1700 as shown in fig. 17. The decoding apparatus 1700 includes a codestream obtaining unit 1710, an indicator value obtaining unit 1720, a parsing unit 1730, and a prediction unit 1740.
The code stream obtaining unit 1710 is configured to obtain a code stream of a current image.
According to an embodiment, the indicator value obtaining unit 1720 is configured to:
(a) obtaining a value of a first indicator of the current image according to the code stream, wherein the first indicator indicates a stripe type;
(b) and obtaining a value of a second indicator of the current image according to the code stream, wherein the second indicator indicates whether a weighted prediction parameter exists in an image header or a strip header of the code stream.
In this case, the parsing unit 1730 is configured to: when the value of the first indicator is equal to a first preset value (for example, 1) and the value of a second indicator is equal to a second preset value (for example, 1), parsing the value of a weighted prediction parameter of a current block from the code stream, wherein the current block is included in a current stripe of the current image, the first preset value is an integer value, and the second preset value is an integer value; the prediction unit is configured to predict the current block according to the value of the weighted prediction parameter.
According to a further embodiment, the indicator value obtaining unit 1720 is configured to:
(a) obtaining a value of a first indicator of the current image according to the code stream, wherein the first indicator indicates a stripe type;
(b) obtaining a value of a second indicator of the current image according to the code stream, wherein the second indicator indicates whether a weighted prediction parameter exists in an image header or a stripe header of the code stream;
(c) and obtaining a value of a third indicator of the current image according to the code stream, wherein the third indicator indicates whether weighted prediction is applicable to an inter-frame strip, and the type of the inter-frame strip is a B strip or a P strip.
In this case, the parsing unit 1730 is configured to: when the value of the first indicator is equal to a first preset value (e.g., 1), the value of the second indicator is equal to a second preset value (e.g., 1), and the value of the third indicator indicates that weighted prediction is applicable to the inter-slice, parsing the value of a weighted prediction parameter of a current block from the code stream, wherein the current block is included in a current slice of the current picture, the first preset value is an integer value, and the second preset value is an integer value; the prediction unit 1740 is configured to predict the current block according to the value of the weighted prediction parameter.
The decoding apparatus 1700 shown in fig. 17 may be or may include the decoder 30 shown in fig. 1A, 1B, and 3 and the video decoder 3206 shown in fig. 11. The decoding apparatus 1700 shown in fig. 17 may be included in the video coding apparatus 400 shown in fig. 4, the device 500 shown in fig. 5, the terminal apparatus 3106 shown in fig. 10, the device 1300 shown in fig. 13, and the device 1400 shown in fig. 14.
Applications of the encoding method and the decoding method described in the above embodiments and a system using the applications are explained below.
Fig. 10 is a block diagram of a content providing system 3100 for implementing a content distribution service. The content providing system 3100 includes a capture device 3102, a terminal device 3106, and optionally a display 3126. Capture device 3102 communicates with terminal device 3106 over communication link 3104. The communication link may include the communication channel 13 described above. Communication link 3104 includes, but is not limited to, Wi-Fi, Ethernet, cable, wireless (3G/4G/5G), USB, or any kind of combination thereof, and the like.
The capture device 3102 generates data and may encode the data by the encoding method shown in the above embodiments. Alternatively, the capture device 3102 may distribute the data to a streaming server (not shown in the figure) that encodes the data and transmits the encoded data to the terminal device 3106. Capture device 3102 includes, but is not limited to, a camera, a smart phone or tablet, a computer or laptop, a video conferencing system, a PDA, an in-vehicle device, or any combination thereof, and the like. For example, capture device 3102 may include source device 12 as described above. When the data includes video, the video encoder 20 included in the capturing device 3102 may actually perform a video encoding process. When the data includes audio (i.e., sound), an audio encoder included in the capture device 3102 may actually perform an audio encoding process. For some practical scenarios, capture device 3102 distributes the encoded video data and the encoded audio data by multiplexing the encoded video data and the encoded audio data together. For other practical scenarios, such as in a video conferencing system, encoded audio data and encoded video data are not multiplexed. The capture device 3102 distributes the encoded audio data and the encoded video data to the terminal device 3106, respectively.
In the content providing system 3100, a terminal device 3106 receives and reproduces encoded data. The terminal device 3106 may be a device with data receiving and restoring capabilities, such as a smart phone or tablet 3108, a computer or notebook 3110, a Network Video Recorder (NVR)/Digital Video Recorder (DVR) 3112, a Television (TV)3114, a Set Top Box (STB) 3116, a video conferencing system 3118, a video surveillance system 3120, a Personal Digital Assistant (PDA) 3122, a vehicle device 3124, or any combination thereof, or a device capable of decoding the encoded data. For example, terminal device 3106 may include destination device 14 as described above. When the encoded data includes video, the video decoder 30 included in the terminal device preferentially performs the video decoding process. When the encoded data includes audio, an audio decoder included in the terminal device preferentially performs audio decoding processing.
For terminal devices with displays, such as a smart phone or tablet 3108, a computer or laptop 3110, a Network Video Recorder (NVR)/Digital Video Recorder (DVR) 3112, a Television (TV)3114, a Personal Digital Assistant (PDA) 3122, or a vehicle device 3124, the terminal device may feed decoded data to the display of the terminal device. For non-display equipped terminal devices, such as STB 3116, video conferencing system 3118 or video surveillance system 3120, an external display 3126 should be connected to receive and display the decoded data.
When each device in the system performs encoding or decoding, an image encoding device or an image decoding device as shown in the above embodiments may be used.
Fig. 11 is a diagram illustrating an exemplary structure of the terminal device 3106. After the terminal device 3106 receives the stream from the capture device 3102, a protocol processing unit (protocol processing unit)3202 analyzes the transmission protocol of the stream. The Protocol includes, but is not limited to, Real Time Streaming Protocol (RTSP), hypertext Transfer Protocol (HTTP), HTTP Live Streaming Protocol (HLS), MPEG-DASH, Real-Time Transport Protocol (RTP), Real Time Messaging Protocol (RTMP), or any kind of combination thereof.
The protocol processing unit 3202 generates a stream file after processing the stream. The file is output to the demultiplexing unit 3204. The demultiplexing unit 3204 may separate the multiplexed data into encoded audio data and encoded video data. As described above, for some practical scenarios, such as in a video conferencing system, encoded audio data and encoded video data are not multiplexed. In this case, the encoded data may be transmitted to the video decoder 3206 and the audio decoder 3208 without using the demultiplexing unit 3204.
Through the demultiplexing process, a video Elementary Stream (ES), an audio ES, and optionally a subtitle are generated. The video decoder 3206 includes the video decoder 30 described in the above embodiment, decodes the video ES by the decoding method shown in the above embodiment to generate a video frame, and feeds this data to the synchronization unit 3212. The audio decoder 3208 decodes the audio ES to generate audio frames, and feeds this data to the synchronization unit 3212. Alternatively, the video frames may be stored in a buffer (not shown in fig. 11) before being fed to the synchronization unit 3212. Similarly, the audio frames may be stored in a buffer (not shown in fig. 11) before being fed to the synchronization unit 3212.
The synchronization unit 3212 synchronizes the video frames and the audio frames and provides the video/audio to the video/audio display 3214. For example, the synchronization unit 3212 synchronizes presentation of video information and audio information. The information may be transcoded into the syntax based on timestamps related to the presentation of the transcoded audio and visual data and timestamps related to the distribution of the data stream itself.
If subtitles are included in the stream, the subtitle decoder 3210 decodes the subtitles, synchronizes the subtitles with video frames and audio frames, and provides the video/audio/subtitles to the video/audio/subtitle display 3216.
The present invention is not limited to the above-described system, and the image encoding apparatus or the image decoding apparatus in the above-described embodiments may be incorporated into other systems, such as an automobile system.
Mathematical operators
The mathematical operators used in this application are similar to those used in the C programming language. However, the present application accurately defines the results of integer divide operations and arithmetic shift operations, and also defines other operations such as exponentiation and real-valued divide operations. The numbering and counting specifications typically start with 0, e.g., "first" corresponds to 0 th, "second" to 1 st, and so on.
Arithmetic operator
The arithmetic operator is defined as follows:
operation of + addition
Subtraction (as a two-parameter operator) or non-operation (as a unary prefix operator)
Multiplication operations, including matrix multiplication operations
x y And (4) a power operation, which represents the y power of x. In other contexts, such an expression is used as a superscript, rather than being understood as an exponentiation.
The/integer division operation truncates the result in the 0 direction. For example, 7/4 and-7/-4 are truncated to 1, -7/4 and 7/-4 are truncated to-1.
Div is used to represent the division operation in the mathematical equation, but the result is not truncated or rounded.
Figure BDA0003739688710001241
Is used to represent a division operation in a mathematical equation without performing a truncation or rounding operation on the result.
Figure BDA0003739688710001242
For calculating the sum of f (i), i takes all integer values between x and y, including x and y.
x% y is modulo and represents the remainder of x divided by y, where x and y are both integers and x > -0 and y > 0.
Logical operators
The logical operators are defined as follows:
boolean logic AND operation of x & & y x and y
Boolean logic OR operation of x | y x and y
| A Boolean logic not operation
Z if x is TRUE or not equal to 0, then the value of y is returned, otherwise, the value of z is returned.
Relational operators
The definition of the relational operator is as follows:
is greater than
Greater than or equal to
< less than
Less than or equal to
Is equal to
| A Is not equal to
When a relational operator is applied to a syntax element or variable to which "na" (not applicable) has been assigned, the value "na" is regarded as a different value of the syntax element or variable. The value "na" is considered not equal to any other value.
Bitwise operator
The definition of the bitwise operator is as follows:
and is operated according to the bit. When an integer parameter is operated on, a two's complement representation of the integer value is operated on. When operating on a binary parameter, if the binary parameter contains fewer bits than another parameter, the shorter parameter is extended by adding more significant bits equal to 0.
The | OR operation is bitwise. When an integer parameter is operated on, a two's complement representation of the integer value is operated on. When operating on a binary parameter, if the binary parameter contains fewer bits than another parameter, the shorter parameter is extended by adding more significant bits equal to 0.
And a bit-wise exclusive OR operation. When an integer parameter is operated on, it is a two's complement representation of the integer value. When operating on a binary parameter, if the binary parameter contains fewer bits than another parameter, the shorter parameter is extended by adding more significant bits equal to 0.
x > > y arithmetically shifts the two's complement integer representation of x to the right by y binary bits. The function is defined only if y is a non-negative integer value. The result of the right shift is that the bits shifted into the Most Significant Bit (MSB) are equal to the MSB of x before the shift operation.
x < < y arithmetically shifts the two's complement integer representation of x by y binary bits to the left. The function is defined only when y is a non-negative integer value. The result of the left shift is that the bit shifted into the Least Significant Bit (LSB) is equal to 0.
Assignment operators
The assignment operator is defined as follows:
operator for value assignment
+ is incremented, i.e., x + + equals x ═ x + 1; when the array index is used, the value of the variable is obtained before the self-increment operation is performed.
-decreasing, i.e. x-equals x-1; when used in an array index, the value of a variable is first found before the self-subtraction operation is performed.
Self-increment specified values, for example: x + ═ 3 corresponds to x +3, and x + ═ (-3) corresponds to x + (-3).
Self-decrementing a given value, for example: x ═ 3 corresponds to x ═ x-3, and x ═ 3 corresponds to x ═ x- (-3).
Range representation the following representation is used to illustrate the range of values:
y.. z x takes integer values from y to z (including y and z), where x, y and z are all integers and z is greater than y.
Mathematical function
The mathematical function is defined as follows:
Figure BDA0003739688710001251
asin (x) triangular arcsine function, operating on a parameter x, which is in the range of-1.0 to 1.0 (including head and tail values), and an output value in the range of-pi ÷ 2 to pi ÷ 2 (including head and tail values), in radians.
Atan (x) triangular arctangent function, which operates on parameter x, with output values in radians ranging from-pi ÷ 2 to pi ÷ 2 (inclusive).
Figure BDA0003739688710001252
Ceil (x) represents the smallest integer greater than or equal to x.
Clip1 Y (x)=Clip3(0,(1<<BitDepth Y )-1,x)
Clip1 C (x)=Clip3(0,(1<<BitDepth C )-1,x)
Figure BDA0003739688710001253
Cos (x) trigonometric cosine function, which operates on the parameter x in radians.
Floor (x) represents the largest integer less than or equal to x.
Figure BDA0003739688710001261
Ln (x) returns the natural logarithm of x (base e logarithm, where e is the natural logarithmic base constant 2.718281828 … …).
Log2(x) returns the base 2 logarithm of x.
Log10(x) returns the base 10 logarithm of x.
Figure BDA0003739688710001262
Figure BDA0003739688710001263
Round(x)=Sign(x)*Floor(Abs(x)+0.5)
Figure BDA0003739688710001264
Sin (x) trigonometric sine function, which operates on the parameter x in radians.
Figure BDA0003739688710001265
Swap(x,y)=(y,x)
Tan (x) a trigonometric tangent function, which operates on the parameter x in radians.
Operation priority order
When parentheses are not used to explicitly represent the order of priority in the expression, the following rule applies:
-the operation of high priority is performed before any operation of low priority.
Operations of the same priority proceed sequentially from left to right.
The priority of the operations is represented in the following table in order from highest to lowest, with higher priority being the higher the position in the table.
If these operators are also used in the C programming language, the priority order employed herein is the same as that employed in the C programming language.
Table (b): operation priority from highest (table top) to lowest (table bottom)
Figure BDA0003739688710001266
Figure BDA0003739688710001271
Text description of logical operations
In the text, the following logical operation statements are described in mathematical form:
Figure BDA0003739688710001272
the description may be made in the following manner:
… … is as follows/… … is subject to the following:
if condition 0, statement 0
Else, if condition 1, statement 1
-……
Else (suggestive explanation about the remaining conditions), the statement n
Every "if … … else in the text, if … … else, … …" statement is introduced by "… … as follows" or "… … as quasi" statements (followed by "if … …"). The last condition "if … … else, if … … else, … …" always follows one "else, … …". The intermediate "if … … else, if … … else, … …" statement may be identified by matching "… … as follows" or "… … as follows" with end content "else, … …".
In the text, the following logical operation statements are described in mathematical form:
Figure BDA0003739688710001273
the description may be made in the following manner:
… … is as follows/… … is subject to the following:
statement 0 if the following conditions are both true:
condition 0a
Condition 0b
Else, statement 1 if one or more of the following conditions is true:
condition 1a
Condition 1b
-……
Else, statement n
In the text, the following logical operation statements are described in mathematical form:
Figure BDA0003739688710001281
the description may be made in the following manner:
when condition 0, statement 0
When condition 1, statement 1.
Embodiments of encoder 20 and decoder 30, etc. and the functions described herein with reference to encoder 20 and decoder 30, etc. may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored as one or more instructions or code in a computer-readable medium or transmitted over a communication medium and executed by a hardware-based processing unit. The computer readable medium may include a computer readable storage medium, corresponding to a tangible medium, such as a data storage medium, or a communication medium including any medium that facilitates transfer of a computer program from one place to another (e.g., according to a communication protocol). In this manner, the computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium or (2) a communication medium such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques of the present invention. The computer program product may include a computer-readable medium.
By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Further, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source via a coaxial cable, fiber optic cable, twisted pair, and Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, and DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium described above. However, it should be understood that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but refer to non-transitory, tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), one or more general purpose microprocessors, one or more Application Specific Integrated Circuits (ASICs), one or more field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry, among others. Accordingly, the term "processor" as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the various functions described herein may be provided within dedicated hardware and/or software modules for encoding and decoding, or incorporated in a combined codec. Furthermore, these techniques may be fully implemented in one or more circuits or logic elements.
The techniques of this disclosure may be implemented in various devices or apparatuses including a wireless handset, an Integrated Circuit (IC), or a set of ICs (e.g., a chipset). Various components, modules, or units are described herein to emphasize functional aspects of devices for performing the disclosed techniques, but these components, modules, or units do not necessarily need to be implemented by different hardware units. Indeed, as mentioned above, the various units may be combined in a codec hardware unit, in combination with suitable software and/or firmware, or provided by a collection of interoperative hardware units including one or more processors as mentioned above.
Other embodiments of the invention will be described below. It should be noted that the numbering used hereinafter does not necessarily have to be consistent with the numbering used above.
Example 1: a method of inter-prediction for an image block, wherein weighted prediction parameters are indicated for a set of prediction blocks and non-rectangular inter-prediction is enabled, the method comprising: obtaining inter-prediction mode parameters for a block, wherein the obtaining comprises checking whether a non-rectangular inter-prediction mode is enabled for the set of blocks comprising a prediction block; obtaining weighted prediction parameters associated with the blocks, inter prediction mode parameters for a block relative to a reference picture indicated for the block, and weighted prediction parameters specified for the set of blocks.
Example 2: the method of embodiment 1, wherein non-rectangular inter prediction is enabled by indicating that a maximum number of triangle fusion candidates (maxnumtrianglemagecand) is greater than 1.
Example 3: the method of embodiment 1 or 2, wherein when the weighted prediction parameters indicate that weighted prediction is enabled for the at least one reference index, then concluding that non-rectangular inter prediction is disabled.
Example 4: the method according to any of embodiments 1-3, wherein the set of blocks is one picture, and the enabling of the non-rectangular inter prediction mode parameter and the weighted prediction parameter are both indicated in a picture header.
Example 5: the method according to any of embodiments 1-4, wherein the set of blocks is a slice, and the enabling of the non-rectangular inter prediction mode parameter and the weighted prediction parameter are both indicated in a slice header.
Example 6: the method according to any of embodiments 1-5, wherein the inter prediction mode parameters comprise a reference index for determining a reference picture and motion vector information for determining a position of a reference block in the reference picture.
Example 7: the method as in any one of embodiments 1-6, wherein the non-rectangular fusion mode is a triangulation mode.
Example 8: the method as in any one of embodiments 1-7, wherein the non-rectangular fusion mode is a GEO mode.
Example 9: the method as in any one of embodiments 1-8 wherein the weighted prediction is a slice-level illumination compensation mechanism (e.g., global weighted prediction).
Example 10: the method as in any one of embodiments 1-9, wherein the weighted prediction is a block-level illumination compensation mechanism, such as Local Illumination Compensation (LIC).
Example 11: the method as in any one of embodiments 1-10, wherein weighting the prediction parameters comprises: a set of flags indicating whether weighted prediction is applied to luma and chroma components of the prediction block; linear model parameters \ alpha and \ beta, specifying a linear transformation of the values of the prediction blocks.
According to a first aspect of the present application, as shown in fig. 12, an inter prediction method 1200 is disclosed, wherein the method comprises: s1201: determining whether a non-rectangular inter prediction mode is allowed for a set of blocks; s1202: obtaining one or more inter prediction mode parameters and weighted prediction parameters for the set of blocks; s1203: obtaining a prediction value of the current block according to the one or more inter prediction mode parameters and the weighted prediction parameter, wherein one inter prediction mode parameter of the inter prediction mode parameters represents reference image information of the current block, and the group of blocks comprises the current block.
In one possible implementation, the reference picture information includes whether weighted prediction is enabled for a reference picture index; and, if weighted prediction is enabled, disabling the non-rectangular inter prediction mode.
In one possible implementation, the non-rectangular inter prediction mode is enabled if weighted prediction is disabled.
In one possible implementation, the determining that the non-rectangular inter prediction mode is allowed to be used includes: indicating that the maximum number of triangle fusion candidates (maxnumtriangecand) is greater than 1.
In a possible implementation, the group of blocks constitutes a picture, and the weighted prediction parameters and the indication information for determining whether the non-rectangular inter prediction mode is allowed to be used are both in a picture header of the picture.
In one possible implementation, the group of blocks constitutes one slice, and the weighted prediction parameter and indication information for determining whether the non-rectangular inter prediction mode is allowed to be used are in a slice header of the slice.
In one possible implementation, the non-rectangular inter-prediction mode is a triangulation mode.
In one possible implementation, the non-rectangular inter-prediction mode is a Geometric (GEO) partition mode.
In one possible implementation, the weighted prediction parameters are used for slice level illumination compensation.
In one possible implementation, the weighted prediction parameters are used for block-level illumination compensation.
In one possible implementation, the weighted prediction parameters include: a plurality of flags indicating whether weighted prediction is applied to a luminance component and/or a chrominance component of a prediction block; a linear model parameter specifying a linear transformation of values of the prediction block.
According to a second aspect of the present application, as shown in fig. 13, an inter prediction apparatus 1300 is disclosed, wherein the apparatus comprises: a non-transitory memory 1301 having stored therein processor-executable instructions; a processor 1302 coupled to the memory 1301 and configured to execute the processor-executable instructions to implement any possible implementation manner of the first aspect of the present application.
According to a third aspect of the present application, a code stream for inter prediction is disclosed, wherein the code stream includes: indication information for determining whether a non-rectangular inter prediction mode is allowed for a group of blocks; one or more inter prediction mode parameters and weighted prediction parameters of the group of blocks, wherein a prediction value of a current block is obtained according to the one or more inter prediction mode parameters and the weighted prediction parameters, one inter prediction mode parameter of the inter prediction mode parameters represents reference image information of the current block, and the group of blocks includes the current block.
In one possible implementation, the reference picture information includes whether weighted prediction is enabled for a reference picture index; and if weighted prediction is enabled, disabling the non-rectangular inter prediction mode.
In one possible implementation, the non-rectangular inter prediction mode is enabled if weighted prediction is disabled.
In one possible implementation manner, the indication information includes: the maximum number of triangle fusion candidates (maxnumtriangecand) is greater than 1.
In a possible implementation, the set of blocks constitutes a picture, and the weighted prediction parameter and the indication information are both in a picture header of the picture.
In a possible implementation, the group of blocks constitutes a slice, and the weighted prediction parameter and the indication information are both in a slice header of the slice.
In one possible implementation, the non-rectangular inter-prediction mode is a triangulation mode.
In one possible implementation, the non-rectangular inter-prediction mode is a Geometric (GEO) partition mode.
In one possible implementation, the weighted prediction parameters are used for slice level illumination compensation.
In one possible implementation, the weighted prediction parameters are used for block-level illumination compensation.
In one possible implementation, the weighted prediction parameters include: a plurality of flags indicating whether weighted prediction is applied to a luminance component and/or a chrominance component of a prediction block; a linear model parameter specifying a linear transformation of values of the prediction block.
According to a fourth aspect of the present application, as shown in fig. 14, an inter prediction apparatus 1400 is disclosed, wherein the apparatus comprises: a determination module 1401 for determining whether a non-rectangular inter prediction mode is allowed for a group of blocks; an obtaining module 1402 for obtaining one or more inter prediction mode parameters and weighted prediction parameters of the set of blocks; a prediction module 1403, configured to obtain a prediction value of the current block according to the one or more inter prediction mode parameters and the weighted prediction parameter, where one of the inter prediction mode parameters represents reference image information of the current block, and the group of blocks includes the current block.
In one possible implementation, the reference picture information includes whether weighted prediction is enabled for a reference picture index; and, if weighted prediction is enabled, disabling the non-rectangular inter prediction mode.
In one possible implementation, the non-rectangular inter prediction mode is enabled if weighted prediction is disabled.
In a possible implementation manner, the determining module 1401 is specifically configured to: indicates that the maximum number of triangle fusion candidates (maxnumtriaanglemerergecand) is greater than 1.
In a possible implementation, the group of blocks constitutes a picture, and the weighted prediction parameter and indication information for determining whether the non-rectangular inter prediction mode is allowed to be used are both in a picture header of the picture.
In one possible implementation, the group of blocks constitutes one slice, and the weighted prediction parameter and indication information for determining whether the non-rectangular inter prediction mode is allowed to be used are in a slice header of the slice.
In one possible implementation, the non-rectangular inter-prediction mode is a triangulation mode.
In one possible implementation, the non-rectangular inter-prediction mode is a Geometric (GEO) partition mode.
In one possible implementation, the weighted prediction parameters are used for slice level illumination compensation.
In one possible implementation, the weighted prediction parameters are used for block-level illumination compensation.
In one possible implementation, the weighted prediction parameters include: a plurality of flags indicating whether weighted prediction is applied to a luminance component and/or a chrominance component of a prediction block; a linear model parameter specifying a linear transformation of values of the prediction block.
Further, the following embodiments/aspects are also provided herein, which can be used in any combination with each other as they are deemed suitable for practical use. These embodiments/aspects are listed below:
according to a first aspect of the present application, there is provided an inter prediction method, wherein the method includes: determining whether a non-rectangular inter prediction mode is allowed for a group of blocks; obtaining one or more inter prediction mode parameters and weighted prediction parameters for the set of blocks; obtaining a prediction value of the current block according to the one or more inter prediction mode parameters and the weighted prediction parameter, wherein one inter prediction mode parameter of the inter prediction mode parameters represents reference image information of the current block, and the group of blocks comprises the current block.
In one possible implementation, the reference picture information includes whether weighted prediction is enabled for a reference picture index; and if weighted prediction is enabled, disabling the non-rectangular inter prediction mode.
In one possible implementation, the non-rectangular inter prediction mode is enabled if weighted prediction is disabled.
In one possible implementation, the determining that the non-rectangular inter prediction mode is allowed to be used includes: indicates that the maximum number of triangle fusion candidates (maxnumtriaanglemerergecand) is greater than 1.
In a possible implementation, the group of blocks constitutes a picture, and the weighted prediction parameters and the indication information for determining whether the non-rectangular inter prediction mode is allowed to be used are both in a picture header of the picture.
In one possible implementation, the group of blocks constitutes one slice, and the weighted prediction parameter and indication information for determining whether the non-rectangular inter prediction mode is allowed to be used are in a slice header of the slice.
In one possible implementation, the non-rectangular inter-prediction mode is a triangulation mode.
In one possible implementation, the non-rectangular inter-prediction mode is a Geometric (GEO) partition mode.
In one possible implementation, the syntax elements related to the number of fusion mode candidates (representing information for determining the non-rectangular inter prediction) are indicated in a Sequence Parameter Set (SPS). Here, when one image includes only one slice, the image header may indicate in the slice header.
In one possible implementation, when an image includes only one slice, the image header indicates in the slice header.
In a possible implementation, the picture parameter set comprises a flag whose value specifies whether the weighted prediction parameter is present in the picture header or in the slice header.
In one possible implementation, a flag in the picture header indicates whether a slice of a non-intra type is present and indicates whether inter prediction mode parameters are indicated for the slice.
In one possible implementation, the weighted prediction parameters are used for slice level illumination compensation.
In one possible implementation, the weighted prediction parameters are used for block-level illumination compensation.
In one possible implementation, the weighted prediction parameters include: a plurality of flags indicating whether weighted prediction is applied to a luminance component and/or a chrominance component of a prediction block; a linear model parameter specifying a linear transformation of values of the prediction block.
According to a second aspect of the present application, there is provided an inter prediction apparatus, wherein the apparatus comprises: a non-transitory memory storing processor-executable instructions; a processor coupled to the memory and configured to execute the processor-executable instructions to implement any of the possible implementations of the first aspect of the present application.
According to a third aspect of the present application, a bitstream for inter prediction is disclosed, wherein the bitstream includes: indication information for determining whether a non-rectangular inter prediction mode is allowed for a group of blocks; one or more inter prediction mode parameters and weighted prediction parameters of the group of blocks, wherein a prediction value of a current block is obtained according to the one or more inter prediction mode parameters and the weighted prediction parameters, one inter prediction mode parameter of the inter prediction mode parameters represents reference image information of the current block, and the group of blocks includes the current block.
In one possible implementation, the reference picture information includes whether weighted prediction is enabled for a reference picture index; and, if weighted prediction is enabled, disabling the non-rectangular inter prediction mode.
In one possible implementation, the non-rectangular inter prediction mode is enabled if weighted prediction is disabled.
In one possible implementation manner, the indication information includes: the maximum number of triangle fusion candidates (maxnumtrianglemagecand) is greater than 1.
In a possible implementation, the set of blocks constitutes a picture, and the weighted prediction parameter and the indication information are both in a picture header of the picture.
In a possible implementation, the group of blocks constitutes a slice, and the weighted prediction parameter and the indication information are both in a slice header of the slice.
In one possible implementation, the non-rectangular inter-prediction mode is a triangulation mode.
In one possible implementation, the non-rectangular inter-prediction mode is a Geometric (GEO) partition mode.
In one possible implementation, the weighted prediction parameters are used for slice level illumination compensation.
In one possible implementation, the weighted prediction parameters are used for block-level illumination compensation.
In one possible implementation, the weighted prediction parameters include: a plurality of flags indicating whether weighted prediction is applied to a luminance component and/or a chrominance component of a prediction block; a linear model parameter specifying a linear transformation of values of the prediction block.
According to a fourth aspect of the present application, there is provided an inter prediction apparatus, wherein the apparatus comprises: a determining module for determining whether a non-rectangular inter prediction mode is allowed for a group of blocks; an obtaining module for obtaining one or more inter prediction mode parameters and weighted prediction parameters of the set of blocks; a prediction module, configured to obtain a prediction value of a current block according to the one or more inter prediction mode parameters and the weighted prediction parameter, where one of the inter prediction mode parameters represents reference image information of the current block, and the group of blocks includes the current block.
In one possible implementation, the reference picture information includes whether weighted prediction is enabled for a reference picture index; and, if weighted prediction is enabled, disabling the non-rectangular inter prediction mode.
In one possible implementation, the non-rectangular inter prediction mode is enabled if weighted prediction is disabled.
In a possible implementation manner, the determining module is specifically configured to: indicating that the maximum number of triangle fusion candidates (maxnumtriangecand) is greater than 1.
In a possible implementation, the group of blocks constitutes a picture, and the weighted prediction parameters and the indication information for determining whether the non-rectangular inter prediction mode is allowed to be used are both in a picture header of the picture.
In one possible implementation, the group of blocks constitutes one slice, and the weighted prediction parameter and indication information for determining whether the non-rectangular inter prediction mode is allowed to be used are in a slice header of the slice.
In one possible implementation, the non-rectangular inter-prediction mode is a triangulation mode.
In one possible implementation, the non-rectangular inter-prediction mode is a Geometric (GEO) partition mode.
In one possible implementation, the weighted prediction parameters are used for slice level illumination compensation.
In one possible implementation, the weighted prediction parameters are used for block-level illumination compensation.
In one possible implementation, the weighted prediction parameters include: a plurality of flags indicating whether weighted prediction is applied to a luminance component and/or a chrominance component of a prediction block; a linear model parameter specifying a linear transformation of values of the prediction block.
Further, the following examples are also provided herein:
1. a method of encoding a video sequence, wherein the method comprises:
determining whether the weighted prediction parameter may exist in the picture header or in the slice header to obtain a determination result;
indicating by a flag in an image parameter set that the weighted prediction parameter may be present in the image header or present in the slice header according to the determination result.
2. The method of embodiment 1, wherein the flag in the picture parameter set is equal to 1, indicating that the weighted prediction parameters are not present in a slice header referring to the picture parameter set, but may be present in a picture header referring to the PPS; the flag in the picture parameter set is equal to 0, indicating that the weighted prediction parameters are not present in a picture header referring to the picture parameter set, but may be present in a slice header referring to the picture parameter set.
3. The method of embodiment 1 or 2, wherein the method further comprises: indicating the weighted prediction parameters in a picture header.
4. The method according to any of the preceding embodiments, wherein the method further comprises: the weighted prediction parameters are indicated in a picture header only if another flag in the picture header indicates that inter-frame banding is enabled.
5. A method of decoding an encoded video sequence, wherein the method comprises: parsing a set of picture parameters in a codestream of the coded video sequence to obtain a value of a flag contained in the set of picture parameters; determining a weighted prediction parameter according to the obtained value of the flag may be present in a picture header of the encoded video sequence or in a slice header of the encoded video sequence.
6. The method of embodiment 5, wherein the flag in the picture parameter set is equal to 1, indicating that the weighted prediction parameters are not present in a slice header referring to the picture parameter set, but may be present in a picture header referring to the PPS; the flag in the picture parameter set is equal to 0, indicating that the weighted prediction parameters are not present in a picture header referring to the picture parameter set, but may be present in a slice header referring to the picture parameter set.
7. The method of embodiment 5 or 6, wherein the method further comprises: and analyzing an image header in the code stream of the video sequence to obtain the weighted prediction parameter from the analyzed image header.
8. The method of any of embodiments 5-7, comprising: analyzing a picture header in the code stream of the video sequence, wherein the picture header comprises another mark; the weighted prediction parameters are obtained from the parsed picture header only when the further flag indicates that inter-frame striping is enabled.
9. A method for inter-predicting a current block in a set of blocks of a video sequence, wherein the method comprises the steps of any of embodiments 1 to 8, and the method further comprises:
obtaining one or more inter prediction mode parameters and weighted prediction parameters for the set of blocks;
and obtaining a prediction value of the current block according to the one or more inter-frame prediction mode parameters and the weighted prediction parameter, wherein one inter-frame prediction mode parameter in the inter-frame prediction mode parameters represents reference image information of the current block.
10. The method of embodiment 9, wherein the reference picture information comprises information on whether weighted prediction is enabled for a reference picture index; if weighted prediction is enabled, at least one non-rectangular inter prediction mode is disabled.
11. The method of embodiment 9 or 10, wherein the non-rectangular inter prediction mode is enabled if weighted prediction is disabled.
12. The method according to any of embodiments 9-11, wherein the set of blocks constitutes a picture, and the information for determining whether the use of the non-rectangular inter prediction mode is allowed is in a picture header of the picture.
13. The method according to any of embodiments 9-11, wherein the set of blocks constitutes a slice, and the information for determining whether the use of the non-rectangular inter prediction mode is allowed is in a slice header of the slice.
14. The method as in any one of embodiments 9-13 wherein the non-rectangular inter-prediction mode is a trigonometric partition mode.
15. The method as in any one of embodiments 9-13 wherein the non-rectangular inter-prediction mode is a geometric partitioning mode.
16. The method as in any one of embodiments 1-15 wherein when an image includes only one slice, the image header indicates in the slice header.
17. The method as in any one of embodiments 1-16 wherein the weighted prediction parameters are used for slice level illumination compensation.
18. The method as in any one of embodiments 1-16 wherein the weighted prediction parameters are used for block level illumination compensation.
19. The method as in any one of embodiments 1-18 wherein the weighted prediction parameters comprise:
a plurality of flags indicating whether weighted prediction is applied to a luminance component and/or a chrominance component of a prediction block;
a linear model parameter specifying a linear transformation of values of the prediction block.
20. An inter-prediction apparatus, wherein the apparatus comprises:
a non-transitory memory storing processor-executable instructions;
a processor coupled to the memory and configured to execute the processor-executable instructions to implement any of embodiments 1-19.

Claims (24)

1. A decoding method implemented by a decoding device, the method comprising:
acquiring a code stream of a current image;
obtaining a value of a first indicator of the current image according to the code stream, wherein the first indicator indicates a stripe type;
obtaining a value of a second indicator of the current image according to the code stream, wherein the second indicator indicates whether a weighted prediction parameter exists in an image header or a slice header;
when the value of the first indicator is equal to a first preset value and the value of the second indicator is equal to a second preset value, analyzing the value of a weighted prediction parameter of a current block from the code stream, wherein the current block is included in a current stripe of the current image, the first preset value is an integer value, and the second preset value is an integer value;
predicting the current block according to the value of the weighted prediction parameter.
2. The method according to claim 1, wherein the value of the first indicator is obtained from a picture header included in the codestream, or wherein the value of the second indicator is obtained from a picture parameter set included in the codestream, or wherein the value of the weighted prediction parameter is parsed from the picture header included in the codestream.
3. The method of claim 1 or 2, wherein the first preset value is 1 and the second preset value is 1.
4. The method according to any of claims 1 to 3, wherein the value of the first indicator is equal to the first preset value, indicating that the slice type of at least one slice included in the current picture is an inter-slice.
5. The method of claim 4, wherein the inter-slice comprises a B-slice or a P-slice.
6. A decoding method implemented by a decoding device, the method comprising:
acquiring a code stream of a current image;
obtaining a value of a first indicator of the current image according to the code stream, wherein the first indicator indicates a stripe type;
obtaining a value of a second indicator of the current image according to the code stream, wherein the second indicator indicates whether a weighted prediction parameter exists in an image header or a slice header;
obtaining a value of a third indicator of the current image according to the code stream, wherein the third indicator indicates whether weighted prediction is applicable to an inter-frame strip, and the type of the inter-frame strip is a B strip or a P strip;
when the value of the first indicator is equal to a first preset value, the value of the second indicator is equal to a second preset value, and the value of the third indicator indicates that weighted prediction is applicable to the inter-frame slice, parsing the value of a weighted prediction parameter of a current block from the code stream, wherein the current block is included in a current slice of the current picture, the first preset value is an integer value, and the second preset value is an integer value;
predicting the current block according to the value of the weighted prediction parameter.
7. The method according to claim 6, wherein the value of the first indicator is obtained from a picture header included in the codestream, or wherein the value of the second indicator is obtained from a picture parameter set included in the codestream, or wherein the value of the weighted prediction parameter is obtained from a picture header included in the codestream, or wherein the value of the third indicator is obtained from a picture parameter set included in the codestream.
8. The method according to claim 6 or 7, wherein the first preset value is 1 and the second preset value is 1.
9. The method according to any of claims 6 to 8, wherein the value of the first indicator is equal to the first preset value, indicating that the slice type of at least one slice included in the current picture is an inter-slice.
10. The method according to any of claims 6 to 9, wherein the third indicator has a value of 1, indicating that weighted prediction is applicable to the inter-slice.
11. A decoder (30) characterized in that it comprises processing circuitry for performing the method according to any one of claims 1 to 10.
12. A computer program product, characterized in that it comprises program code for performing the method according to any one of claims 1 to 10, when the program code is executed by a computer or a processor.
13. A decoder (30), comprising:
one or more processors;
a non-transitory computer readable storage medium, wherein the non-transitory computer readable storage medium is coupled with the processor and stores a program for execution by the processor, the program, when executed by the processor, causing the decoder to perform the method of any of claims 1-10.
14. A non-transitory computer-readable medium carrying program code which, when executed by a computer device, causes the computer device to perform the method of any one of claims 1 to 10.
15. A decoding device (1700), comprising:
a code stream obtaining unit (1710) for obtaining a code stream of the current image;
an indicator value obtaining unit (1720) for:
obtaining a value of a first indicator of the current image according to the code stream, wherein the first indicator indicates a stripe type;
obtaining a value of a second indicator of the current image according to the code stream, wherein the second indicator indicates whether a weighted prediction parameter exists in an image header or a stripe header of the code stream;
an analysis unit (1730) for: when the value of the first indicator is equal to a first preset value and the value of the second indicator is equal to a second preset value, analyzing the value of a weighted prediction parameter of a current block from the code stream, wherein the current block is included in a current stripe of the current image, the first preset value is an integer value, and the second preset value is an integer value;
a prediction unit (1740) for predicting the current block based on the value of the weighted prediction parameter.
16. The decoding apparatus (1700) of claim 15,
the indicator value obtaining unit (1720) is configured to obtain the value of the first indicator according to a picture header included in the codestream or obtain the value of the second indicator according to a picture parameter set included in the codestream; or
The parsing unit (1730) is configured to parse a value of the weighted prediction parameter from a header included in the codestream.
17. The decoding apparatus (1700) of claim 15 or 16 wherein the first preset value is 1 and the second preset value is 1.
18. The decoding apparatus (1700) of any of the preceding claims 15 through 17 wherein the value of the first indicator equals the first preset value indicating that the slice type of the at least one slice included in the current picture is an inter slice.
19. The decoding apparatus (1700) of claim 18 wherein the inter-slice comprises a B-slice or a P-slice.
20. A decoding device (1700), comprising:
a code stream obtaining unit (1710) for obtaining a code stream of the current image;
an indicator value obtaining unit (1720) for:
obtaining a value of a first indicator of the current image according to the code stream, wherein the first indicator indicates a stripe type;
obtaining a value of a second indicator of the current image according to the code stream, wherein the second indicator indicates whether a weighted prediction parameter exists in an image header or a stripe header of the code stream;
obtaining a value of a third indicator of the current image according to the code stream, wherein the third indicator indicates whether weighted prediction is applicable to an inter-frame slice, and the type of the inter-frame slice is a B slice or a P slice;
an analysis unit (1730) for: when the value of the first indicator is equal to a first preset value, the value of the second indicator is equal to a second preset value, and the value of the third indicator indicates that weighted prediction is applicable to the inter-frame slice, parsing the value of a weighted prediction parameter of a current block from the code stream, wherein the current block is included in a current slice of the current picture, the first preset value is an integer value, and the second preset value is an integer value;
a prediction unit (1740) for predicting the current block depending on the value of the weighted prediction parameter.
21. The decoding apparatus (1700) of claim 20,
the indicator value obtaining unit (1720) is configured to obtain a value of the first indicator according to a picture header included in the codestream, or obtain a value of the second indicator according to a picture parameter set included in the codestream, or obtain a value of the third indicator according to a picture parameter set included in the codestream; or
The parsing unit (1730) is configured to parse the weighted prediction parameters from a header included in the codestream.
22. The decoding apparatus (1700) of claim 20 or 21 wherein the first preset value is 1 and the second preset value is 1.
23. The decoding apparatus (1700) of any of the preceding claims 20 through 22 wherein the value of the first indicator is equal to the first preset value indicating that the slice type of the at least one slice included in the current picture is an inter-slice.
24. The decoding apparatus (1700) of any of the preceding claims 20 through 23 wherein the third indicator has a value of 1 indicating that weighted prediction is applicable to the inter-slice.
CN202180008825.6A 2020-01-12 2021-01-12 Method and apparatus for coordinating weighted prediction using non-rectangular fusion patterns Active CN114930836B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211586724.XA CN115988219B (en) 2020-01-12 2021-01-12 Method and apparatus for coordinating weighted prediction using non-rectangular fusion patterns

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US202062960134P 2020-01-12 2020-01-12
US62/960,134 2020-01-12
PCT/RU2021/050003 WO2021045658A2 (en) 2020-01-12 2021-01-12 Method and apparatus of harmonizing weighted prediction with non-rectangular merge modes

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202211586724.XA Division CN115988219B (en) 2020-01-12 2021-01-12 Method and apparatus for coordinating weighted prediction using non-rectangular fusion patterns

Publications (2)

Publication Number Publication Date
CN114930836A true CN114930836A (en) 2022-08-19
CN114930836B CN114930836B (en) 2023-12-15

Family

ID=74853461

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202211586724.XA Active CN115988219B (en) 2020-01-12 2021-01-12 Method and apparatus for coordinating weighted prediction using non-rectangular fusion patterns
CN202180008825.6A Active CN114930836B (en) 2020-01-12 2021-01-12 Method and apparatus for coordinating weighted prediction using non-rectangular fusion patterns

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202211586724.XA Active CN115988219B (en) 2020-01-12 2021-01-12 Method and apparatus for coordinating weighted prediction using non-rectangular fusion patterns

Country Status (10)

Country Link
US (1) US20220400260A1 (en)
EP (1) EP4078965A4 (en)
JP (1) JP2023510858A (en)
KR (1) KR20220123717A (en)
CN (2) CN115988219B (en)
AU (1) AU2021201607A1 (en)
BR (1) BR112022013803A2 (en)
CA (1) CA3167535A1 (en)
MX (1) MX2022008593A (en)
WO (1) WO2021045658A2 (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
BR112022012154A2 (en) * 2019-12-20 2022-08-30 Lg Electronics Inc IMAGE/VIDEO CODING METHOD AND DEVICE BASED ON WEIGHTED PREDICTION
AU2021207558A1 (en) * 2020-01-13 2022-09-08 Lg Electronics Inc. Method and device for coding image/video on basis of prediction weighted table
AU2021207559A1 (en) * 2020-01-13 2022-09-08 Lg Electronics Inc. Method and device for weighted prediction for image/video coding
CN115244937A (en) * 2020-01-14 2022-10-25 Lg电子株式会社 Image encoding/decoding method and apparatus for signaling information related to sub-picture and picture header and method for transmitting bitstream
KR20220143030A (en) * 2020-02-19 2022-10-24 바이트댄스 아이엔씨 Signaling of prediction weights in the general constraint information of the bitstream
AR121127A1 (en) * 2020-02-29 2022-04-20 Beijing Bytedance Network Tech Co Ltd SIGNALING OF REFERENCE IMAGE INFORMATION IN A VIDEO BITSTREAM
US11563963B2 (en) 2020-05-19 2023-01-24 Qualcomm Incorporated Determining whether to code picture header data of pictures of video data in slice headers
WO2022198144A1 (en) * 2021-03-30 2022-09-22 Innopeak Technology, Inc. Weighted prediction for video coding

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130329790A1 (en) * 2012-06-08 2013-12-12 Texas Instruments Incorporated Method and System for Reducing Slice Header Parsing Overhead in Video Coding
US20140105299A1 (en) * 2012-09-30 2014-04-17 Qualcomm Incorporated Performing residual prediction in video coding
CN103959793A (en) * 2011-10-10 2014-07-30 高通股份有限公司 Efficient signaling of reference picture sets
US20150264404A1 (en) * 2014-03-17 2015-09-17 Nokia Technologies Oy Method and apparatus for video coding and decoding

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100878827B1 (en) * 2005-07-08 2009-01-14 엘지전자 주식회사 Method for modeling coding information of a video signal to compress/decompress the information
WO2007114608A1 (en) * 2006-03-30 2007-10-11 Lg Electronics Inc. A method and apparatus for decoding/encoding a video signal
US8767824B2 (en) * 2011-07-11 2014-07-01 Sharp Kabushiki Kaisha Video decoder parallelization for tiles
CN106899847B (en) * 2011-10-17 2019-12-27 株式会社东芝 Electronic device and decoding method
US9516308B2 (en) * 2012-04-27 2016-12-06 Qualcomm Incorporated Parameter set updates in video coding
US9497473B2 (en) * 2013-10-03 2016-11-15 Qualcomm Incorporated High precision explicit weighted prediction for video coding
GB2547052B (en) * 2016-02-08 2020-09-16 Canon Kk Methods, devices and computer programs for encoding and/or decoding images in video bit-streams using weighted predictions
CN112236995A (en) * 2018-02-02 2021-01-15 苹果公司 Multi-hypothesis motion compensation techniques

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103959793A (en) * 2011-10-10 2014-07-30 高通股份有限公司 Efficient signaling of reference picture sets
US20130329790A1 (en) * 2012-06-08 2013-12-12 Texas Instruments Incorporated Method and System for Reducing Slice Header Parsing Overhead in Video Coding
US20140105299A1 (en) * 2012-09-30 2014-04-17 Qualcomm Incorporated Performing residual prediction in video coding
US20150264404A1 (en) * 2014-03-17 2015-09-17 Nokia Technologies Oy Method and apparatus for video coding and decoding

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BENJAMIN BROSS: "JVET-P2001-vE, Versatile Video Coding (Draft 7)" *
HENDRY: "JVET-Q0200,[AHG9]: On picture level and slice level tool parameters" *
潘榕;董文辉;: "AVS+视频编码技术及相关测试标准解读" *

Also Published As

Publication number Publication date
EP4078965A4 (en) 2023-04-05
US20220400260A1 (en) 2022-12-15
CN114930836B (en) 2023-12-15
EP4078965A2 (en) 2022-10-26
CN115988219A (en) 2023-04-18
BR112022013803A2 (en) 2022-09-13
WO2021045658A3 (en) 2021-07-15
CA3167535A1 (en) 2021-03-11
AU2021201607A1 (en) 2022-08-11
MX2022008593A (en) 2022-10-20
CN115988219B (en) 2024-01-16
JP2023510858A (en) 2023-03-15
KR20220123717A (en) 2022-09-08
WO2021045658A2 (en) 2021-03-11
WO2021045658A9 (en) 2021-06-10

Similar Documents

Publication Publication Date Title
CN115988219B (en) Method and apparatus for coordinating weighted prediction using non-rectangular fusion patterns
CN115567717B (en) Encoder, decoder and corresponding methods and devices
CN112823518A (en) Apparatus and method for inter prediction of triangularly partitioned blocks of coded blocks
CN113613013B (en) Video decoding device, decoding method implemented by decoding device and decoder
CN113841405B (en) Method and apparatus for local illumination compensation for inter prediction
CN113924780A (en) Method and device for affine inter-frame prediction of chroma subblocks
KR20210141712A (en) Encoder, Decoder and Corresponding Methods Using IBC Merge List
CN114846795B (en) Method and device for indicating fusion mode candidate quantity
US20220217332A1 (en) Harmonizing triangular merge mode with weighted prediction
JP2024055894A (en) Encoder, decoder and corresponding method for simplifying picture header signaling - Patents.com
JP7423758B2 (en) High-level signaling method and apparatus for weighted prediction
US20220247999A1 (en) Method and Apparatus of Harmonizing Weighted Prediction with Non-Rectangular Merge Modes
WO2020251419A2 (en) Method and apparatus of harmonizing weighted prediction with affine model based motion compensation for inter prediction
CN114930840A (en) Derivation of motion vector range for enhanced interpolation filter
CN113302929A (en) Sample distance calculation for geometric partitioning mode
WO2021134393A1 (en) Method and apparatus of deblocking filtering between boundaries of blocks predicted using weighted prediction and non-rectangular merge modes
CN114830652A (en) Method and apparatus for reference sample interpolation filtering for directional intra prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40070262

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant