US20190246114A1 - Techniques of multi-hypothesis motion compensation - Google Patents
Techniques of multi-hypothesis motion compensation Download PDFInfo
- Publication number
- US20190246114A1 US20190246114A1 US16/257,904 US201916257904A US2019246114A1 US 20190246114 A1 US20190246114 A1 US 20190246114A1 US 201916257904 A US201916257904 A US 201916257904A US 2019246114 A1 US2019246114 A1 US 2019246114A1
- Authority
- US
- United States
- Prior art keywords
- prediction
- data
- coding
- pixel block
- hypothesis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 title claims abstract description 66
- 239000013598 vector Substances 0.000 claims description 79
- 238000005192 partition Methods 0.000 claims description 40
- 230000008569 process Effects 0.000 abstract description 9
- 230000011664 signaling Effects 0.000 description 38
- 238000001914 filtration Methods 0.000 description 16
- 238000013139 quantization Methods 0.000 description 15
- 230000002123 temporal effect Effects 0.000 description 12
- 238000010586 diagram Methods 0.000 description 10
- 230000003044 adaptive effect Effects 0.000 description 9
- 238000009795 derivation Methods 0.000 description 8
- 230000000875 corresponding effect Effects 0.000 description 7
- 238000012935 Averaging Methods 0.000 description 6
- 238000000638 solvent extraction Methods 0.000 description 6
- 230000015572 biosynthetic process Effects 0.000 description 5
- 238000005286 illumination Methods 0.000 description 5
- 238000003786 synthesis reaction Methods 0.000 description 5
- 238000007476 Maximum Likelihood Methods 0.000 description 4
- 238000006073 displacement reaction Methods 0.000 description 4
- 230000002457 bidirectional effect Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000009826 distribution Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 235000019580 granularity Nutrition 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 1
- 241000023320 Luma <angiosperm> Species 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000003750 conditioning effect Effects 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012886 linear function Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- OSWPMRLSEDHDFF-UHFFFAOYSA-N methyl salicylate Chemical compound COC(=O)C1=CC=CC=C1O OSWPMRLSEDHDFF-UHFFFAOYSA-N 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 208000016261 weight loss Diseases 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/147—Data rate or code amount at the encoder output according to rate distortion criteria
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/157—Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
- H04N19/159—Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/109—Selection of coding mode or of prediction mode among a plurality of temporal predictive coding modes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/513—Processing of motion vectors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/567—Motion estimation based on rate distortion criteria
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/70—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
Definitions
- the present disclosure relates to video coding and, in particular, to video coding techniques that employ multi-hypothesis motion compensation coding (also, “MHMC” coding).
- MHMC multi-hypothesis motion compensation coding
- Modern video compression systems such as MPEG-4 AVC, HEVC, VP9, VP8, and AV1, often employ block-based strategies for video compression.
- a rectangular video region e.g. an image, a tile, and rows within such regions
- pixel blocks rectangular or square blocks
- MHMC is an inter prediction method where prediction of pixel blocks within a region such as a frame, tile, or slice, may be coded several different ways (called “hypotheses”) and the hypotheses are combined together.
- a prediction block ⁇ for an input pixel block s can be generated using a set of N hypotheses h i as follows:
- Motion vectors indicate displacement information, i.e. the displacement of the current block in relationship to the reference frame identified for their respective hypotheses. Commonly, such displacement is limited to only translational information.
- Current MHMC coding systems merge hypotheses via linear models as shown in Eq. 1.
- Modern codecs such as AVC and HEVC, do not force the number of hypotheses for a block in a multihypothesis-predicted frame. Instead, a block can use a single or more hypotheses, or even be predicted using intra prediction.
- the number of maximum hypotheses that could be used is limited to two. This is mostly due to the cost of signalling any associated parameters relating to the prediction process but also due to complexity and bandwidth of reading data and reconstructing the final prediction. At lower bitrates, for example, mode and motion information can dominate the overall bitrate of an encoded stream.
- increasing the number of hypotheses even slightly, i.e. from two to three could result, in considerable, e.g. a 33%, increase in bandwidth in a system, and therefore impact power consumption. Data prefetching and memory management can become more complex and costly. Current coding protocols do not always provide efficient mechanisms to communicate video coding parameters between encoders and decoders.
- the indication of prediction lists to be used in B slices is performed using a single parameter, which is sent at the prediction unit level of the coding syntax.
- a parameter may indicate whether a block will be using the List0 prediction, List1 prediction, or Bi-prediction.
- This type of signalling may also include indication of other coding modes as well, such as SKIP and DIRECT modes, Intra, etc.
- This method therefore, incurs cost in terms of the signalling overhead to communicate these parameters and the cost can be expected to increase if MHMC coding techniques were employed that expand the number of hypothesis to levels greater than two.
- the overhead cost of signalling other coding parameters such as prediction mode and motion vectors, would increase if MHMC coding technique expand the number of hypotheses to levels greater than two.
- the inventors perceive a need for techniques to improve MHMC coding operations that provide greater flexibility to coding system to represent image data in coding.
- FIG. 1 illustrates a simplified block diagram of a video delivery system 100 according to an aspect of the present disclosure.
- FIG. 2 is a functional block diagram illustrating components of an encoding terminal.
- FIG. 3 is a functional block diagram illustrating components of a decoding terminal according to an aspect of the present disclosure.
- FIG. 4 illustrates exemplary application of MHMC coding.
- FIGS. 5( a ) and 5( b ) respectively, illustrate exemplary operation of multi-hypothesis motion compensation coding according to an aspect of the present disclosure.
- FIG. 6 is a functional block diagram of a coding system according to an aspect of the present disclosure.
- FIG. 7 is a functional block diagram of a decoding system according to an aspect of the present disclosure.
- Each coding hypothesis may include generation of prediction data for the input pixel block according to a respective prediction search.
- the input pixel block may be coded with reference to a prediction block formed from prediction data derived according to plurality of hypotheses.
- Data of the coded pixel block may be transmitted to a decoder along with data identifying a number of the hypotheses used during the coding to a channel.
- an inverse process may be performed, which may include generation of a counterpart prediction block from prediction data derived according to the hypothesis identified with the coded pixel block data, then decoding of the coded pixel block according to the prediction data.
- FIG. 1 illustrates a simplified block diagram of a video delivery system 100 according to an aspect of the present disclosure.
- the system 100 may include a plurality of terminals 110 , 120 interconnected via a network.
- the terminals 110 , 120 may code video data for transmission to their counterparts via the network.
- a first terminal 110 may capture video data locally, code the video data and transmit the coded video data to the counterpart terminal 120 via a channel.
- the receiving terminal 120 may receive the coded video data, decode it, and render it locally, for example, on a display at the terminal 120 . If the terminals are engaged in bidirectional exchange of video data, then the terminal 120 may capture video data locally, code the video data and transmit the coded video data to the counterpart terminal 110 via another channel.
- the receiving terminal 110 may receive the coded video data transmitted from terminal 120 , decode it, and render it locally, for example, on its own display.
- the processes described can operate on both frame and field frame coding but, for simplicity, the present discussion will describe the techniques in the context of integral frames.
- a video coding system 100 may be used in a variety of applications.
- the terminals 110 , 120 may support real time bidirectional exchange of coded video to establish a video conferencing session between them.
- a terminal 110 may code pre-produced video (for example, television or movie programming) and store the coded video for delivery to one or, often, many downloading clients (e.g., terminal 120 ).
- the video being coded may be live or pre-produced, and the terminal 110 may act as a media server, delivering the coded video according to a one-to-one or a one-to-many distribution model.
- the type of video and the video distribution schemes are immaterial unless otherwise noted.
- the terminals 110 , 120 are illustrated as a personal computer and a smart phone, respectively, but the principles of the present disclosure are not so limited. Aspects of the present disclosure also find application with various types of computers (desktop, laptop, and tablet computers), computer servers, media players, dedicated video conferencing equipment and/or dedicated video encoding equipment.
- the network 130 represents any number of networks that convey coded video data between the terminals 110 , 120 , including for example wireline and/or wireless communication networks.
- the communication network may exchange data in circuit-switched or packet-switched channels.
- Representative networks include telecommunications networks, local area networks, wide area networks, and/or the Internet. For the purposes of the present discussion, the architecture and topology of the network are immaterial to the operation of the present disclosure unless otherwise noted.
- FIG. 2 is a functional block diagram illustrating components of an encoding terminal according to an aspect of the present disclosure.
- the encoding terminal may include a video source 210 , an image processor 220 , a coding system 230 , and a transmitter 240 .
- the video source 210 may supply video to be coded.
- the video source 210 may be provided as a camera that captures image data of a local environment, a storage device that stores video from some other source or a network connection through which source video data is received.
- the image processor 220 may perform signal conditioning operations on the video to be coded to prepare the video data for coding. For example, the preprocessor 220 alter the frame rate, frame resolution, and/or other properties of the source video.
- the image processor 220 also may perform filtering operations on the source video.
- the coding system 230 may perform coding operations on the video to reduce its bandwidth. Typically, the coding system 230 exploits temporal and/or spatial redundancies within the source video. For example, the coding system 230 may perform motion compensated predictive coding in which video frame or field frames are parsed into sub-units (called “pixel blocks,” for convenience), and individual pixel blocks are coded differentially with respect to predicted pixel blocks, which are derived from previously-coded video data. A given pixel block may be coded according to any one of a variety of predictive coding modes, such as:
- the coding system 230 may include a forward coder 232 , a decoder 233 , an in-loop filter 234 , a frame buffer 235 , and a predictor 236 .
- the coder 232 may apply the differential coding techniques to the input pixel block using predicted pixel block data supplied by the predictor 236 .
- the decoder 233 may invert the differential coding techniques applied by the coder 232 to a subset of coded frames designated as reference frames.
- the in-loop filter 234 may apply filtering techniques to the reconstructed reference frames generated by the decoder 233 .
- the frame buffer 235 may store the reconstructed reference frames for use in prediction operations.
- the predictor 236 may predict data for input pixel blocks from within the reference frames stored in the frame buffer.
- the transmitter 240 may transmit coded video data to a decoding terminal via a channel CH.
- FIG. 3 is a functional block diagram illustrating components of a decoding terminal according to an aspect of the present disclosure.
- the decoding terminal may include a receiver 310 to receive coded video data from the channel, a video decoding system 320 that decodes coded data, a post-processor 330 , and a video sink 340 that consumes the video data.
- the receiver 310 may receive a data stream from the network and may route components of the data stream to appropriate units within the terminal 300 .
- FIGS. 2 and 3 illustrate functional units for video coding and decoding
- terminals 110 , 120 FIG. 1
- the receiver 310 may parse the coded video data from other elements of the data stream and route it to the video decoder 320 .
- the video decoder 320 may perform decoding operations that invert coding operations performed by the coding system 140 .
- the video decoder may include a decoder 322 , an in-loop filter 324 , a frame buffer 326 , and a predictor 328 .
- the decoder 322 may invert the differential coding techniques applied by the coder 142 to the coded frames.
- the in-loop filter 324 may apply filtering techniques to reconstructed frame data generated by the decoder 322 .
- the in-loop filter 324 may perform various filtering operations (e.g., de-blocking, de-ringing filtering, sample adaptive offset processing, and the like).
- the filtered frame data may be output from the decoding system.
- the frame buffer 326 may store reconstructed reference frames for use in prediction operations.
- the predictor 328 may predict data for input pixel blocks from within the reference frames stored by the frame buffer according to prediction reference data provided in the coded video data.
- the post-processor 330 may perform operations to condition the reconstructed video data for display. For example, the post-processor 330 may perform various filtering operations (e.g., de-blocking, de-ringing filtering, and the like), which may obscure visual artifacts in output video that are generated by the coding/decoding process. The post-processor 330 also may alter resolution, frame rate, color space, etc. of the reconstructed video to conform it to requirements of the video sink 340 .
- various filtering operations e.g., de-blocking, de-ringing filtering, and the like
- the video sink 340 represents various hardware and/or software components in a decoding terminal that may consume the reconstructed video.
- the video sink 340 typically may include one or more display devices on which reconstructed video may be rendered.
- the video sink 340 may be represented by a memory system that stores the reconstructed video for later use.
- the video sink 340 also may include one or more application programs that process the reconstructed video data according to controls provided in the application program.
- the video sink may represent a transmission system that transmits the reconstructed video to a display on another device, separate from the decoding terminal; for example, reconstructed video generated by a notebook computer may be transmitted to a large flat panel display for viewing.
- FIGS. 2 and 3 illustrates operations that are performed to code and decode video data in a single direction between terminals, such as from terminal 110 to terminal 120 ( FIG. 1 ).
- each terminal 110 , 120 will possess the functional units associated with an encoding terminal ( FIG. 2 ) and each terminal 110 , 120 will possess the functional units associated with a decoding terminal ( FIG. 3 ).
- terminals 110 , 120 may exchange multiple streams of coded video in a single direction, in which case, a single terminal (say terminal 110 ) will have multiple instances of an encoding terminal ( FIG. 2 ) provided therein.
- Such implementations are fully consistent with the present discussion.
- FIG. 4 illustrates exemplary application of MHMC coding.
- a block 412 in the current frame 410 may be coded with reference to N reference frames 420 . 1 - 420 . n - 1 .
- Motion vectors mv 1 -mvn may be derived that reference prediction sources in the reference frames 420 . 1 - 420 . n - 1 .
- the prediction sources may be combined as per Eq. 1 to yield a prediction block ⁇ that serves as a predictor for block 412 .
- coders may develop coding hypotheses for an input block using different block sizes.
- One set of exemplary coding hypotheses is shown in FIGS. 5( a ) and 5( b ) .
- an input block 512 from an input frame 510 may be coded using coding hypotheses that use different block sizes.
- a first hypothesis is represented by motion vector mv 1 , which codes the input block 512 with reference to a block 522 in a first reference frame 520 ; the reference block 522 has a size equal to the size of block 512 .
- a second set of hypotheses is represented by motion vectors mv 2 . 1 -mv 2 . 4 .
- the input block 512 is partitioned into a plurality of sub-blocks (here, four sub-blocks 514 . 1 - 514 . 4 ), each of which are predictively coded.
- FIG. 5( a ) illustrates each of the sub-blocks 514 . 1 - 514 . 4 coded with reference to content from a common reference frame 530 , shown as reference blocks 532 . 1 - 532 . 4 .
- each of the sub-blocks may be predicted independently of the other sub-blocks, which may cause prediction references to be selected from the same or different reference frames (not shown).
- FIG. 5( b ) illustrates the relationship between the coding hypotheses and a prediction block ⁇ 540 that will be used for coding and decoding. Content of the various hypotheses may be merged together to generate the prediction block.
- the prediction block 540 may be generated from prediction data contributed by block 522 of the first hypothesis and prediction data contributed by the sub-blocks 532 . 1 - 532 . 4 , which may be weighted according to respective weight factors as shown in Eq. 1.
- FIGS. 5( a ) and 5( b ) illustrate coding by two hypotheses (the second hypothesis being formed from predictions of four sub-blocks), in principle, MHMC coding may be extended to a greater number of hypotheses.
- FIG. 5( b ) illustrates a linear weighting model, non-linear weighting models also may be used.
- FIG. 6 is a functional block diagram of a coding system 600 according to an aspect of the present disclosure.
- the system 600 may include a pixel block coder 610 , a pixel block decoder 620 , a frame buffer 630 , an in-loop filter system 640 , a reference frame store 650 , a predictor 660 , a controller 670 , and a syntax unit 680 .
- the predictor 660 may develop the different hypotheses for use during coding of a newly-presented input pixel block s and it may supply a prediction block ⁇ to the pixel block coder 610 .
- the pixel block coder 610 may code the new pixel block by predictive coding techniques and present coded pixel block data to the syntax unit 680 .
- the pixel block decoder 620 may decode the coded pixel block data, generating decoded pixel block data therefrom.
- the frame buffer 630 may generate reconstructed frame data from the decoded pixel block data.
- the in-loop filter 640 may perform one or more filtering operations on the reconstructed frame. For example, the in-loop filter 640 may perform deblocking filtering, sample adaptive offset (SAO) filtering, adaptive loop filtering (ALF), maximum likelihood (ML) based filtering schemes, deringing, debanding, sharpening, resolution scaling, and the like.
- SAO sample adaptive offset
- ALF adaptive loop filtering
- ML maximum likelihood
- the reference frame store 650 may store the filtered frame, where it may be used as a source of prediction of later-received pixel blocks.
- the syntax unit 680 may assemble a data stream from the coded pixel block data, which conforms to a governing coding protocol.
- the predictor 660 may select the different hypotheses from among the different candidate prediction modes that are available under a governing coding syntax.
- the predictor 660 may decide, for example, the number of hypotheses that may be used, the prediction sources for those hypotheses and, in certain aspects, partitioning sizes at which the predictions will be performed. For example, the predictor 660 may decide whether a given input pixel block will be coded using a prediction block that matches the sizes of the input pixel block or whether it will be coded using prediction blocks at smaller sizes.
- the predictor 660 also may decide, for some smaller-size partitions of the input block, that SKIP coding will be applied to one or more of the partitions (called “null” coding herein).
- the pixel block coder 610 may include a subtractor 612 , a transform unit 614 , a quantizer 616 , and an entropy coder 618 .
- the pixel block coder 610 may accept pixel blocks of input data at the subtractor 612 .
- the subtractor 612 may receive predicted pixel blocks from the predictor 660 and generate an array of pixel residuals therefrom representing a difference between the input pixel block and the predicted pixel block.
- the transform unit 614 may apply a transform to the sample data output from the subtractor 612 , to convert data from the pixel domain to a domain of transform coefficients.
- the quantizer 616 may perform quantization of transform coefficients output by the transform unit 614 .
- the quantizer 616 may be a uniform or a non-uniform quantizer.
- the entropy coder 618 may reduce bandwidth of the output of the coefficient quantizer by coding the output, for example, by variable length code words or using a context adaptive binary arithmetic coder.
- the transform unit 614 may operate in a variety of transform modes as determined by the controller 670 .
- the transform unit 614 may apply a discrete cosine transform (DCT), a discrete sine transform (DST), a Walsh-Hadamard transform, a Haar transform, a Daubechies wavelet transform, or the like.
- the controller 670 may select a coding mode M to be applied by the transform unit 615 , may configure the transform unit 615 accordingly and may signal the coding mode M in the coded video data, either expressly or impliedly.
- the quantizer 616 may operate according to a quantization parameter Q P that is supplied by the controller 670 .
- the quantization parameter Q P may be applied to the transform coefficients as a multi-value quantization parameter, which may vary, for example, across different coefficient locations within a transform-domain pixel block.
- the quantization parameter Q P may be provided as a quantization parameters array.
- the entropy coder 618 may perform entropy coding of data output from the quantizer 616 .
- the entropy coder 618 may perform run length coding, Huffman coding, Golomb coding, Context Adaptive Binary Arithmetic Coding, and the like.
- the pixel block decoder 620 may invert coding operations of the pixel block coder 610 .
- the pixel block decoder 620 may include a dequantizer 622 , an inverse transform unit 624 , and an adder 626 .
- the pixel block decoder 620 may take its input data from an output of the quantizer 616 . Although permissible, the pixel block decoder 620 need not perform entropy decoding of entropy-coded data since entropy coding is a lossless event.
- the dequantizer 622 may invert operations of the quantizer 616 of the pixel block coder 610 .
- the dequantizer 622 may perform uniform or non-uniform de-quantization as specified by the decoded signal Q P .
- the inverse transform unit 624 may invert operations of the transform unit 614 .
- the dequantizer 622 and the inverse transform unit 624 may use the same quantization parameters Q P and transform mode M as their counterparts in the pixel block coder 610 . Quantization operations likely will truncate data in various respects and, therefore, data recovered by the dequantizer 622 likely will possess coding errors when compared to the data presented to the quantizer 616 in the pixel block coder 610 .
- the adder 626 may invert operations performed by the subtractor 612 . It may receive the same prediction pixel block from the predictor 660 that the subtractor 612 used in generating residual signals. The adder 626 may add the prediction pixel block to reconstructed residual values output by the inverse transform unit 624 and may output reconstructed pixel block data.
- the frame buffer 630 may assemble a reconstructed frame from the output of the pixel block decoders 620 .
- the in-loop filter 640 may perform various filtering operations on recovered pixel block data.
- the in-loop filter 640 may include a deblocking filter, a sample adaptive offset (“SAO”) filter, and/or other types of in loop filters (not shown).
- SAO sample adaptive offset
- the reference frame store 650 may store filtered frame data for use in later prediction of other pixel blocks. Different types of prediction data are made available to the predictor 660 for different prediction modes. For example, for an input pixel block, intra prediction takes a prediction reference from decoded data of the same frame in which the input pixel block is located. Thus, the reference frame store 650 may store decoded pixel block data of each frame as it is coded. For the same input pixel block, inter prediction may take a prediction reference from previously coded and decoded frame(s) that are designated as reference frames. Thus, the reference frame store 650 may store these decoded reference frames.
- the predictor 660 may supply prediction blocks ⁇ to the pixel block coder 610 for use in generating residuals.
- the predictor 660 may include, for each of a plurality of hypotheses 661 . 1 - 661 . n , an inter predictor 662 , an intra predictor 663 , and a mode decision unit 662 .
- the different hypotheses 661 . 1 - 661 . n may operate at different partition sizes as described above.
- the inter predictor 662 may receive pixel block data representing a new pixel block to be coded and may search reference frame data from store 650 for pixel block data from reference frame(s) for use in coding the input pixel block.
- the inter-predictor 662 may perform its searches at the partition sizes of the respective hypothesis. Thus, when searching at smaller partition sizes, the inter-predictor 662 may perform multiple searches, one using each of the sub-partitions at work for its respective hypothesis.
- the inter predictor 662 may select prediction reference data that provides a closest match to the input pixel block being coded.
- the inter predictor 662 may generate prediction reference metadata, such as prediction block size and motion vectors, to identify which portion(s) of which reference frames were selected as source(s) of prediction for the input pixel block.
- the intra predictor 663 may support Intra (I) mode coding.
- the intra predictor 663 may search from among pixel block data from the same frame as the pixel block being coded that provides a closest match to the input pixel block.
- the intra predictor 663 also may run searches at the partition size for its respective hypothesis and, when sub-partitions are employed, separate searches may be run for each sub-partition.
- the intra predictor 663 also may generate prediction mode indicators to identify which portion of the frame was selected as a source of prediction for the input pixel block.
- the mode decision unit 664 may select a final coding mode for the hypothesis from the output of the inter-predictor 662 and the inter-predictor 663 .
- the mode decision unit 664 may output prediction data and the coding parameters (e.g., selection of reference frames, motion vectors and the like) for the mode selected for the respective hypothesis.
- the mode decision unit 664 will select a mode that achieves the lowest distortion when video is decoded given a target bitrate. Exceptions may arise when coding modes are selected to satisfy other policies to which the coding system 600 adheres, such as satisfying a particular channel behavior, or supporting random access or data refresh policies.
- Prediction data output from the mode decision units 664 of the different hypotheses 661 . 1 - 661 .N may be input to a prediction block synthesis unit 665 , which merges the prediction data into an aggregate prediction block ⁇ .
- the prediction block ⁇ may be formed from a linear combination of the predictions from the individual hypotheses, for example, as set forth in Eq. 1, or non-linear combinations may be performed.
- the prediction block synthesis unit 665 may supply the prediction block ⁇ to the pixel block coder 610 .
- the predictor 660 may output to the controller 670 parameters representing coding decisions for each hypothesis.
- the controller 670 may control overall operation of the coding system 600 .
- the controller 670 may select operational parameters for the pixel block coder 610 and the predictor 660 based on analyses of input pixel blocks and also external constraints, such as coding bitrate targets and other operational parameters.
- Q P quantization parameters
- the controller 670 when it selects quantization parameters Q P , the use of uniform or non-uniform quantizers, and/or the transform mode M, it may provide those parameters to the syntax unit 680 , which may include data representing those parameters in the data stream of coded video data output by the system 600 .
- the controller 670 also may select between different modes of operation by which the system may generate reference images and may include metadata identifying the modes selected for each portion of coded data.
- the controller 670 may revise operational parameters of the quantizer 616 and the transform unit 615 at different granularities of image data, either on a per pixel block basis or on a larger granularity (for example, per frame, per slice, per largest coding unit (“LCU”) or Coding Tree Unit (CTU), or another region).
- the quantization parameters may be revised on a per-pixel basis within a coded frame.
- the controller 670 may control operation of the in-loop filter 640 and the prediction unit 660 .
- control may include, for the prediction unit 660 , mode selection (lambda, modes to be tested, search windows, distortion strategies, etc.), and, for the in-loop filter 640 , selection of filter parameters, reordering parameters, weighted prediction, etc.
- FIG. 7 is a functional block diagram of a decoding system 700 according to an aspect of the present disclosure.
- the decoding system 700 may include a syntax unit 710 , a pixel block decoder 720 , an in-loop filter 740 , a reference frame store 750 , a predictor 760 , and a controller 770 .
- the pixel block decoder 720 and predictor 760 may be instantiated for each of the hypotheses identified by the coded video data.
- the syntax unit 710 may receive a coded video data stream and may parse the coded data into its constituent parts. Data representing coding parameters may be furnished to the controller 770 , while data representing coded residuals (the data output by the pixel block coder 610 of FIG. 6 ) may be furnished to its respective pixel block decoder 720 .
- the predictor 760 may generate a prediction block ⁇ from reference data available in the reference frame store 750 according to coding parameter data provided in the coded video data. It may supply the prediction block ⁇ to the pixel block decoder.
- the pixel block decoder 720 may invert coding operations applied by the pixel block coder 610 ( FIG. 6 ).
- the frame buffer 730 may create a reconstructed frame from decoded pixel blocks s′ output by the pixel block decoder 720 .
- the in-loop filter 740 may filter the reconstructed frame data.
- the filtered frames may be output from the decoding system 700 . Filtered frames that are designated to serve as reference frames also may be stored in the reference frame store 750 .
- the pixel block decoder 720 may include an entropy decoder 722 , a dequantizer 724 , an inverse transform unit 726 , and an adder 728 .
- the entropy decoder 722 may perform entropy decoding to invert processes performed by the entropy coder 618 ( FIG. 6 ).
- the dequantizer 724 may invert operations of the quantizer 716 of the pixel block coder 610 ( FIG. 6 ).
- the inverse transform unit 726 may invert operations of the transform unit 614 ( FIG. 6 ). They may use the quantization parameters Q P and transform modes M that are provided in the coded video data stream.
- the pixel blocks s′ recovered by the dequantizer 724 likely will possess coding errors when compared to the input pixel blocks s presented to the pixel block coder 610 of the encoder ( FIG. 6 ).
- the adder 728 may invert operations performed by the subtractor 610 ( FIG. 6 ). It may receive a prediction pixel block from the predictor 760 as determined by prediction references in the coded video data stream. The adder 728 may add the prediction pixel block to reconstructed residual values output by the inverse transform unit 726 and may output reconstructed pixel block data.
- the frame buffer 730 may assemble a reconstructed frame from the output of the pixel block decoder 720 .
- the in-loop filter 740 may perform various filtering operations on recovered pixel block data as identified by the coded video data.
- the in-loop filter 740 may include a deblocking filter, a sample adaptive offset (“SAO”) filter, and/or other types of in loop filters. In this manner, operation of the frame buffer 730 and the in loop filter 740 mimics operation of the counterpart frame buffer 630 and in loop filter 640 of the encoder 600 ( FIG. 6 ).
- the reference frame store 750 may store filtered frame data for use in later prediction of other pixel blocks.
- the reference frame store 750 may store decoded frames as it is coded for use in intra prediction.
- the reference frame store 750 also may store decoded reference frames.
- the predictor 760 may supply the prediction blocks ⁇ to the pixel block decoder 720 .
- the predictor 760 may retrieve prediction data from the reference frame store 750 for each of the hypotheses represented in the coded video data (represented by hypothesis predictors 762 . 1 - 762 . n ).
- a prediction block synthesis unit 764 may generate an aggregate prediction block ⁇ from the prediction data of the different hypothesis. In this manner, the prediction block synthesis unit 764 may replicate operations of the synthesis unit 665 from the encoder ( FIG. 6 ).
- the predictor 760 may supply predicted pixel block data as determined by the prediction reference indicators supplied in the coded video data stream.
- the controller 770 may control overall operation of the coding system 700 .
- the controller 770 may set operational parameters for the pixel block decoder 720 and the predictor 760 based on parameters received in the coded video data stream.
- these operational parameters may include quantization parameters Q P for the dequantizer 724 and transform modes M for the inverse transform unit 710 .
- the received parameters may be set at various granularities of image data, for example, on a per pixel block basis, a per frame basis, a per slice basis, a per LCU/CTU basis, or based on other types of regions defined for the input image.
- encoders may provide signalling in a coded bit stream to identify the number of hypotheses that are used to code frame data for each such hypothesis. For example, when frame data is to be coded by MHMC, coded video data may identify a number of hypotheses using a count value. When a given hypothesis can be coded as prediction blocks of different sizes, coded video data may contain data identifying such prediction block sizes.
- the field number_of_hypotheses_minus2 may identify the number of hypotheses available for coding.
- the field list_prediction_implicit_present_flag[i] enforces that a hypothesis from list i is always present for all CTUs or at least for all inter CTUs in the current slice.
- the fields log2_min_luma_hypothesis_block_size_minus2[i] and log2_diff_max_min_luma_hypothesis_block_size[i] respectively may identify minimum and maximum sizes of prediction blocks that are available for each such hypothesis. Providing minimum and maximum sizes for the prediction blocks may constrain coding complexity by avoiding MHMC combinations outside the indicated size ranges.
- the foregoing example permits use of multiple hypotheses for block sizes starting from M L ⁇ N L to M H ⁇ N H , where M L , N L , M H , N H are defined, respectively, by the log2_min_luma_hypothesis_block_size_minus2[i] and log2_diff_max_min_luma_hypothesis_block_size[i] fields.
- M L , N L , M H , N H are defined, respectively, by the log2_min_luma_hypothesis_block_size_minus2[i] and log2_diff_max_min_luma_hypothesis_block_size[i] fields.
- the minimum size for a hypothesis from list i was defined as 16 and the maximum size was defined as 64
- this syntax would permit prediction block sizes from list i with block sizes of, for example, 16 ⁇ 16, 32 ⁇ 32, and 64 ⁇ 64 pixels. If rectangular partitions were also permitted for prediction, these would also be limited within the specified resolutions.
- the syntax identified in Table 1 may be provided at different levels of coded video data, such as at the sequence-, picture-, slice-level, a segment, or for a group of CTUs.
- Table 12 illustrates application of the syntax to an exemplary HEVC-style slice syntactic element.
- the syntax elements may be assigned to these levels dynamically based on coding decision performed by an encoder. In this manner, the syntax increases flexibility in the encoder and permits the encoder to conserve signalling overhead when the syntax elements may be provided in higher-level elements of the coding protocol.
- Table 2 provides an example in which slice segment header syntax is extended to include parameters that indicate for which block sizes a particular hypothesis will be valid or not.
- a single parameter is identified for the minimum width and height, i.e. log2_min_luma_hypothesis_block_size_minus2[i] as well as for the maximum, i.e. log2_diff_max_min_luma_hypothesis_block_size_minus2[i].
- the list_prediction_implicit_present_flag[i] discussed above is provided as well. Independent parameters for the width and height could also be used.
- Additional elements that are impacted by the use of Multihypothesis prediction instead of biprediction such as the number of references for each list (num_ref_idx_active_minus1_list[i]), the use of zero motion prediction (mvd_list_zero_flag[i]), and from which list the collocated temporal motion vector will be derived.
- the foregoing syntax table also provides examples of other coding parameters that may be included.
- the num_ref_idx_active_minus1_list[i] parameter may specify a maximum reference index for a respective reference picture list that may be used to decode a slice.
- the collocated from list parameter for example, may identify a list from which temporal motion vector prediction is derived.
- the collocated_ref_idx parameter may identify a reference index of the collocated picture used for temporal motion vector prediction.
- MHMC coding techniques may be represented in coding of a pixel block using a syntax as shown, for example, in Table 3.
- prediction units may be defined for each prediction block in use, and their sizes may be identified using the nPbW, nPbH values.
- HEVC which limits prediction to up to two lists, and where the prediction type is indicated with a mode parameter, here the use of a particular hypothesis is indicated through a list specific flag, i.e. list_pred_idc_flag[i][x0][y0]. This flag is either derived based on syntax components signalled earlier in higher levels or based on the utilization of other lists, or explicitly signalled in the bit stream.
- this flag is one then that hypothesis will be used in combination with other enabled hypotheses, but if it is set to zero then a hypothesis from that list will not be considered.
- other parameters such as reference indices motion vectors, but also other information that may be signalled at this level, such as illumination compensation parameters, can then be signalled.
- the field list_pred_idc_flag[i][x0][y0] indicates whether a list corresponding to hypothesis i is to be used for coding
- ref_idx_list[i][x0][y0] indicates an index into the respective list i identifying a reference frame that was used for prediction
- mvp_list_flag[i][x0][y0] indicates a motion vector predictor index of list i.
- a coding syntax may impose a restriction on the number of “minimum” references that can be combined together.
- the syntax may restrict any hypotheses from list index i, with i>1, to be used only when a list with a lower index also is used or, alternatively, if all lower index lists are present. All of these restrictions could also be supported or defined by either implicitly imposing them for all bit streams or by signalling the restrictions for multihypothesis prediction in a higher syntax level. As discussed below, aspects of the present disclosure accommodate such restrictions as multihypothesis modes.
- coding syntax for the prediction unit may be represented as follows:
- the field list_pred_idc_flag[i][x0][y0] indicates whether a list corresponding to hypothesis i is to be used for coding
- ref_idx_list[i][x0][y0] indicates an index into the respective list i identifying a reference frame that was used for prediction
- mvp_list_flag[i][x0][y0] indicates a motion vector predictor index of list i.
- coding syntax may employ additional techniques, as described below.
- motion parameters of a given hypothesis may be derived through relationship(s) with other hypotheses.
- motion parameters or their differential values may be set to have a value of zero using the mvd_list_zero_flag parameter.
- syntax may be designed so it can specify any of the following cases for implicit derivation of motion parameters for a list:
- d Establish a mathematical relationship of the motion vectors of one list i with the motion vectors of another, earlier-in-presence list j (j ⁇ i).
- the relationship may also be related to the ref_idx and its associated picture order count or other parameter that may be available in the codec, and which parameter indicates a form of temporal distance relationships between references.
- the relationship may be based on signalled parameters in a higher level syntax, e.g. a linear model using parameters alpha ( ⁇ ) and beta ( ⁇ ). These parameters could also be vector values and have different elements for the horizontal and vertical motion vector elements. These parameters could then be used to weigh and offset the decoded motion parameters of list j.
- Other non-linear mathematical models may also be employed.
- Mathematical relationships may be also defined by providing the relationships of a list with more than one other lists. In that case, the motion vector may be computed as
- Motion vectors for a list could be derived also through motion vector chaining.
- the syntax may represent motion parameters for a block at location (x,y) in frame i, using a reference j, if for another list a motion vector is provided pointing to another reference k, in which the co-located block, also at location (x,y) uses a motion vector that points at reference j.
- MV i,j (x,y) MV i,k (x,y)+MV k,j (x,y).
- Motion vector chaining could be considered using multiple vectors, if these are available through other lists.
- a motion vector from a frame at time t may reference data in a frame at time t ⁇ 2 by building an aggregate motion vector from a first motion vector that references an intermediate frame at time t ⁇ 1 (e.g., a motion vector from time t to t ⁇ 1) and a second motion vector the references the destination frame at time t ⁇ 2 from the intermediate frame (e.g., a motion vector from time t ⁇ 1 to t ⁇ 2).
- Motion acceleration may also be considered to provide a basis to predict current motion vectors.
- the predictor of a motion vector may be computed for frame i of a block at location (x,y) from frame i ⁇ 1, and assuming that all previous partitions used references frames that were only of a distance of 1, as follows:
- MV (i,i-1) ( x,y ) 2* MV (i-1,i-2) ( x,y ) ⁇ MV (i-2,i-3) ( x,y ) (Eq. 4)
- motion vector derivation may be based on partitions in other frames and their actual motion path.
- Such process involves higher complexity since now the projected area may not correspond to a block used for coding.
- the projected area may contain multiple subpartitions each having its own motion.
- encoders and decoders either may combine all the vectors together or may use all the vectors as independent hypotheses and subpartition the block being coded based on these hypotheses. Combining the vectors could be done using a variety of methods, e.g.
- chaining could persist across multiple pictures, with the number of pictures to be chained constrained by the codec at a higher level if desired. In such a case also, if there is no usable vector, such as the block being intra encoded, chaining could terminate.
- Non-linear combinations of prediction samples may be performed instead of or in addition to using straight averaging or weighted averaging.
- different predictors h i (x,y) may be combined non-linearly using an equation of the form:
- ref(x,y) may represent an anchor predictor that is most correlated with the current signal and f is the non-linear function.
- the function could be for example a gaussian function or some other similar function.
- the anchor predictor can be chosen from the most adjacent frame temporally, or the predictor associated with the smallest Q P .
- the anchor predictor could also be indicated at a higher syntax structure, e.g. the slice header.
- the different predictors for multihypothesis prediction can be weighted based also on location of the current sample and its distance to the blocks used for predicting the hypotheses motion vectors.
- the first hypothesis may be weighted to have higher impact on samples that are closer to the left edge of the block and the second hypothesis may be weighted to have higher impact on samples that are closer to the top edge of the block.
- weights may be applied to each of these predictors.
- the temporal hypothesis may be assigned a single weight everywhere, whereas the hypothesis from the left may have relatively large weights on the left boundaries of the block with weight reductions at locations towards the right of the reconstructed block.
- Analogous weight distributions may be applied for the hypothesis at other locations (for example, the above block), where weights may be modulated based on directions of prediction.
- Motion vectors for a particular list may be limited to values within a defined window around the (0,0) motion vector, that is known to the decoder. Such limitations may contribute to better data prefetch operations at a decoder for motion compensation using more than 2 hypotheses.
- Subpixel motion compensation may be limited for multihypothesis prediction.
- multihypothesis prediction may be limited to only 1 ⁇ 8 th , quarter, half pel, or integer precision.
- Integer prediction could also be further restricted to certain integer multiples, e.g. multiples of 2, 3, 4, etc, or in general powers of 2. Such restrictions could be imposed on all lists if more hypotheses than 2 are used, or only on certain lists, e.g. list i>1, or list combinations, e.g. whenever list 2 is used all lists are restricted to half pel precision. If such precision is used, and if these methods are combined with implicit motion vector derivation from another list, appropriate quantization and clipping of the derived motion parameters may be performed.
- Limitations could also be imposed on the filters used for subpixel prediction, i.e. in the number of filter taps that will be used to generate those samples. For example, for uni- or bi-prediction, a coding syntax may permit 8 tap filters to be used, but for multihypothesis prediction with more than 2 references only 4 tap or bilinear filtering may be permitted for all lists. The filters could also be selected based on the list index. The limitations could be based on block size as well, i.e. for larger partitions longer filters may be used, but the size of the filters may be limited for smaller partitions.
- Different ones of foregoing techniques may be applied in a coding syntax for prediction blocks of different size. For example, for larger partitions, the cost of signalling additional motion parameters is small. However, for smaller partitions that cost can be significant. Therefore, for larger partitions, e.g. 16 ⁇ 16 and above, explicit motion signalling of lists i>1 may be permitted, but, for partition sizes smaller than 16 ⁇ 16, derivation may be implicit.
- merge and skip modes can also be supported in such codecs, and the derivation of the appropriate motion parameters discussed above may be extended to these modes as well.
- HEVC and some other coding systems partition a frame of input data first into a Coding Tree Unit (CTU), also called as the Largest Coding Unit (commonly, an “LCU”), of a predetermined size (say, 64 ⁇ 64 pixels).
- CTU Coding Tree Unit
- LCU Largest Coding Unit
- Each CTU may be partitioned into increasingly smaller pixel blocks back on content contained therein.
- CUs coding units
- relationships of the CTUs and the coding units (“CUs”) contained within them may represent a hierarchical tree data structure.
- prediction units (“PUs”) may be provided for the CUs at leaf nodes of the tree structure; the PUs may carry prediction information such as coding mode information, motion vectors, weighted prediction parameters, and the like.
- aspects of the present disclosure extend multi-hypothesis prediction to provide for different coding hypotheses at different block sizes.
- a block of size (2M) ⁇ (2M) can be designated as a bi-predicted block but, for each M ⁇ M subblock, different motion parameters can be signalled for each hypothesis/list.
- an encoder can keep one of the hypotheses fixed for a larger area, while searching for the second hypothesis using smaller regions.
- a decoder may use hypothesis information to determine whether it should “prefetch” the entire larger area in memory for generating the first hypothesis, whereas for the second hypothesis smaller areas could be brought into memory and then combined with the first hypothesis to generate the final prediction.
- the parameter inter_pred_idc[x0][y0] may be expanded to indicate that additional subpartitioning is permitted for a block for any of the hypotheses and lists, i.e. List0, List1, or both. If such subpartitioning is permitted for only one of the hypotheses/lists, e.g. List0, then for the other list, e.g. List1, its associated parameters can be signalled immediately, e.g. within a prediction unit, whereas for the first hypothesis/list the signalling of said parameters would have to be done in combination with its partitioning.
- subpartitions may include syntax elements to identify a null prediction which indicates that the additional hypothesis from the current list will not be considered for this subpartition. It may also be possible to signal syntax elements that will indicate the use of intra prediction instead of inter prediction in combination with or instead of other inter hypotheses for that subpartition.
- the hierarchical nature of the codec may induce overhead for signalling the prediction mode information, which ordinarily is provided in prediction units that are identified from leaf nodes of coding unit tress.
- aspects of the present disclosure provide several techniques to reduce signalling overhead.
- the signalling of inter_pred_idc may be provided at higher levels of the coding_quadtree syntax than the prediction unit.
- the following syntax tables demonstrate an example of how this could be done.
- a parameter named “quadtree_mode_enable_flag” may be provided at a sequence, picture, or slice level in a coding syntax that indicates whether the inter_pred_idc designations are provided at higher levels within the coding_quadtree syntax. If the quadtree_mode_enable_flag value is set to TRUE (1), for example, it may indicate that inter_pred_idc designations are so provided.
- the encoder may signal the inter prediction mode and the split flags in such higher level constructs, for example, in sequence, picture, or slice level itself, thus reducing the signalled overhead.
- intra prediction modes are not used within subpartitions, thus reducing overhead further.
- Table 5 provides an exemplary coding quadtree syntax provided in an HEVC-style coding application.
- the values split_cu_l0_flag[x0][y0], split_cu_l1_flag[x0][y0] and split_cu_flag[x0][y0] respectively indicate whether and how a CU may be further partitioned.
- the inter_pred_idc[x0][y0] may identify a prediction mode of all sub-partitions contained within the CU where it is identified.
- Table 6 illustrates an exemplary quadtree splitting syntax, again, working from an HEVC-style coding syntax. This new syntax is utilized when the independent splitting mode for each list is enabled. Otherwise the original quadtree method, as is used in HEVC, is used. As indicated, coding_quadtree_list values may be derived from the inter_pred_idc value, which is signalled in the coding_quadtree syntax. The corresponding list is also passed into this syntax since this will be later used to determine which information, i.e. motion vectors from which corresponding list, will need to be signalled.
- Table 7 illustrates coding_quadtree_list syntax, which shows a further use of the split_cu_flag to indicate whether the respective coding unit for the list current_list may be split further for multihypothesis coding.
- the current_list value is passed from higher level syntactic elements.
- Table 8 illustrates an exemplary coding unit list syntax, which makes use of the current_list value as passed from higher level syntactic elements.
- derivation of prediction_unit_list parameters may use the current_list value from higher level syntactic elements.
- Table 9 illustrates an exemplary syntax for coding prediction units in an aspect of the present disclosure.
- values of ref_idx_l0, mvp_l0 flag, ref_idx_l1, mvp_l1 flag are signalled at the encoder and known to be decoded at the decoder for only the list current_list, which is passed to the prediction_unit_list process from higher level syntactic elements. This is different from legacy prediction units where the mode was indicated within the prediction unit, and which would indicate whether list0 only, list1 only, or biprediction shall be used and the related parameters will be signalled together within the same prediction unit.
- Transform coding for example in HEVC can be performed independently from the specifications of the prediction units.
- overhead signalling may be limited even further by constraining subpartitioning in hierarchical multihypothesis prediction to one and only one of the lists.
- the partitioning may be fixed. In such a case, no special indication flag would be needed, and a single set of motion parameters would be sent for one list, whereas, for the other list(s), the syntax may accommodate further subpartitioning and signalling of additional parameters.
- Such aspects may be built into coding rules for the syntax, in which case the constraint would be enforced always.
- the constraint may be activated dynamically by an encoder, in which case it would also be signalled at higher-level constructs within a coding syntax (e.g., the sequence, picture, or slice headers).
- the depth subpartitioning of each list may be signalled at a similar level. Such depth could also be associated with the depth of the coding unit, in which case, further subdivision of a coding unit would not occur beyond a certain block size.
- Such constraints may have an impact on when an encoder sends split flag parameters in the stream. If for example, an encoder reaches the maximum splitting depth, no additional split flags need to be transmitted.
- coding syntax may include elements that identify prediction lists that are to be used for pixel blocks.
- Some coding protocols organize prediction references for bipredictively-coded pixel blocks into lists, commonly represented as “List0” and “List1.” and identify reference frames for prediction using indexes into such lists.
- aspects of the present disclosure introduce signalling at higher levels of a syntax (e.g., at the sequence, picture, slice header, coding tree unit level (or coding unit groups of a certain size in a codec such as HEVC, or macroblocks in a codec like MPEG-4 AVC)), of whether a given list prediction is present in a lower coding unit within the higher level.
- the signalling may be provided as an “implicit prediction” flag that, when activated for a given list, indicates that the list is to be used for coding of the pixel blocks referenced by the sequence, slice, etc.
- Table 10 provides an example of syntax that may indicate whether prediction lists are present for bipredictively-coded pixel blocks.
- the syntax is provided as part of signalling of a slice.
- list0_prediction_implicit present_flag or list1_prediction_implicit_present_flag indicates that the respective list (e.g., List0, List 1) data is always present for coding of the pixel blocks contained within the slice.
- the syntax may be provided for other syntactic elements in coded video data, such as a sequence, a picture, a coding tree unit level or other coding unit groups. Table 12 below also illustrates application of the implicit prediction flag syntax to an exemplary HEVC-style slice syntactic element.
- list1_prediction_implicit_present_flag a block in that slice would be either a unipredicted block using List1 or a bipredicted block using List1 and another prediction reference. If both flags are set TRUE, then the corresponding blocks can only use biprediction using both List0 and List 1.
- implicit prediction flags may be extended to MHMC coding using a higher number N of hypotheses (N>2). In such a case, implicit prediction flags may be provided for the list corresponding to each hypothesis.
- a list_prediction_implicit_present_flag[i] is shown, for example, in Table 1. When a list_prediction_implicit_present_flag[i] is set for a hypothesis i, it signifies that the associated prediction list (e.g., a List) will be used for coding of pixel blocks contained within the respective syntactic elements (e.g., the sequence, picture, slice, etc.).
- Table 11 indicates how syntax of an HEVC-style prediction unit might be simplified using the implicit prediction flags:
- the list0_pred_idc_flag[x0][y0] specifies whether list0 used for the current prediction unit. In this case this is already known from a higher layer. Similarly, when the list1_prediction_implicit_present_flag is set to TRUE, then prediction unit syntax will not include the flag list1_pred_idc_flag[x0][y0]. As noted in Table 11, when list0 is not to be used (list0_pred_idc_flag1), it signals implicitly that list1 has to be used and thus signalling of list1_pred_idc_flag[x0][y0] can be avoided in such a case.
- Table 12 illustrates the syntax of Table 1 applied to an exemplary slice segment header, which is derived from the HEVC protocol:
- syntax may be developed to identify non-square prediction block sizes, for example, with parameters to define width and height of the minimum and maximum prediction block sizes.
- the slice header syntax of Table 12 may lead to a syntax for a prediction unit that can handle multihypothesis slices, as shown in Table 13.
- coding syntax rules may be to set MvdList to zero only when a block is predicted by more than one lists, regardless of which lists are used. For example, the following condition may be applied:
- signalling may be provided at a lower level than a slice unit, that identifies which lists are to be used by also restricting the reference indices of any list[i] to one. Doing so may save bits for uni-prediction or multihypothesis prediction.
- Such signalling could be applied, for example at fixed CTU intervals.
- signalling may be provided for N CTUs with N potentially specified at a higher level (such as the sequence parameter sets, picture parameter sets, or a slice, at a CTU level or even lower), but could also be signalled in a new coding unit, e.g. a CTU Grouping unit, that can have an arbitrary number of CTUs, e.g. M, associated with it.
- Such number could be signalled within the CTU Grouping unit, but could also be omitted and the CTU Grouping unit could contain appropriate termination syntax (Start and end codes) that allows one to determine how many CTUs are contained within.
- coding syntax may provide additional information within a CTU grouping, such as follows:
- An encoder also may enable or disable list indices for uni-prediction or multihypothesis prediction if desired.
- indications of whether all reference indices or only one will be used.
- An encoder may provide such indications independently or jointly for uni-prediction or multihypothesis prediction (i.e. for uni-prediction all references of a particular list could be used and only one for multi-hypothesis prediction if both modes are enabled).
- Other aspects of the present disclosure provide techniques for motion vector prediction both for bipredictive multihypothesis motion compensation (e.g., up to 2 hypotheses) and for multi-hypotheses prediction (N>2).
- a coder may select to use, as predictors, one or more motion vectors of the current block from a list or lists that have already been signalled in the bit stream. For example, assume that, by the time list1 motion vectors are to be coded, list0 motion vector parameters have already been signalled for the current block.
- the list1 motion vectors for the same block may be predicted directly from information of the list0 coding, instead of the surrounding partitions. That is, an encoder may compute a predictor vector as
- a is a scaling factor that relates to the distances of the references in L0 and L1 compared to the current picture
- ⁇ right arrow over ( ⁇ ) ⁇ is a constant that could be sent at a higher level syntax structure (e.g. the sequence parameter sets, picture parameter sets, or the slice header).
- the value of a for example, can be computed using the picture order count (poc) information associated with each picture,
- this predictor also may be signalled implicitly, i.e. enabled always for a partition. Further, an encoder may signal which predictor should be used from a possible list of motion vector predictor candidates using a parameter such as a flag. If, for example, this parameter or flag is set to 0, then the collocated vector in the same block from the other list would be used, otherwise another predictor, e.g. a predictor based on a similar derivation as in HEVC, or a median, or averaging candidate of the spatial or temporal neighbourhood motion vectors could be used.
- a parameter such as a flag
- prediction could come from list0, if the prediction is always available, or from the lowest index available list used for prediction.
- signalling may be provided to identify the list that is to be used for predicting the parameters of the current list. Such signalling could be done globally (e.g. at a higher syntax element such as the sequence or picture parameter sets, or the slice header), where an encoder may signal how vectors of a particular list, when used in multihypothesis prediction, would be predicted, or done at the block level using a parameter similar to conventional mvp_lX_flag parameters.
- an encoder may combine the parameters mvp_lX_flags into a single parameter that signals predictors jointly for all lists. This aspect may reduce signalling overhead as compared to cases in which signalling is provided independently for each hypothesis. That is, instead of signalling that L0 and L1 would use the first predictor using the value of 0 for mvp_l0_flag and mvp_l1_flag independently, in this new parameter mvp_selector, if its value is 0 both lists are selected from the same first predictor.
- mvp_selector If the value is 1, then both are selected from the second predictor, if that is available, whereas if its value is equal to 2, then for L0 it indicates that an encoder used its first predictor and, for L1, its second predictor and so on.
- Such correspondence between the value of mvp_selector and which predictor to use for each list could be pre-specified as a syntax coding rule, or it could also be signalled inside the bit stream at a higher syntax element (e.g. in the sequence or picture parameter sets, or the slice header).
- the mvp_selector could be used with more than 2 lists.
- an encoder may entropy code the element mvp_selector using an entropy coding scheme such as UVLC (exp-golomb coding) or Context Adaptive Binary Arithmetic Coding (CABAC).
- entropy coding scheme such as UVLC (exp-golomb coding) or Context Adaptive Binary Arithmetic Coding (CABAC).
- occurrence statistics may be used to update probability statistics and thus how the value is encoded using an adaptive arithmetic encoding engine.
- the statistics collected would be impacted by how many lists are enabled each time. If a single list is to be used, there are only two possible state values for example, whereas if 2 lists are used, then more states are used.
- an encoder may maintain a table of mvp_selector parameters depending on the number and/or characteristics of combined lists. Adaptation for entropy coding of each entry in the table may be done independently. In another aspect, a single entry may be maintained, but the statistics may be normalized as coding is performed based on the number of lists used for the current block. If, for example, only uniprediction is to be used, values for mvp_selector above 1 would not make sense. In that case, the normalization would be based only on the number of occurrences of values 0 and 1.
- Prediction of motion vectors could also be performed jointly using both neighbouring (spatial and/or temporal) and collocated (motion vectors for the current block but from another list) candidates.
- prediction uses both the neighbouring predictor, the predictor of the list0 vector, and its motion vector residual error.
- scaling factor for mvd the scaling factor for mvd
- syntax rules may impose the use of a single predictor per list and identify, for the current block, which earlier-encoded list could be used to predict a subsequent list.
- the syntax still may permit the use of more than one lists to predict the current list. This could be done, for example, using an averaging, either linear or nonlinear.
- predictors may be derived as:
- w i are weights that may be either pre-specified by syntax rules or signalled at a higher coding level.
- signalling of which mvds to use from earlier lists may be done using an mvp_selector parameter signalled in the prediction unit.
- an encoder may use a single predictor for multiple hypotheses or tie the computation of the predictors for multiple hypotheses together.
- Such coding decisions may be signalled in the coding bit stream or may be defined by coding rules of its syntax.
- mvp_l0[ref_idx_l0] and mvp_l1[ref_idx_l1] can be tied together. This could be done by signalling which lists and which references will be accounted at the same time at a higher level syntax structure (e.g. the sequence level, the picture level, or the slice level). Such consideration could also be made at a group of CTUs level as well.
- the relationship between the two mvps could be derived by using a simple scaling factor (scale_factor) that could be signalled explicitly or, alternatively, could be determined by POC differences of the two references.
- a decoder can form both at the same time from a combined candidate set. This can be achieved by a variety of ways, for instance if mvp_l0[ref_idx_l0] is computed as median vector of three candidates, the decoder can re-formulate that as minimizing a sum over three terms.
- the decoder minimizes a similar sum with six terms.
- mvp_l1[ref_idx_l1] can then be computed as scale_factor*mvp_l0[ref_idx_l0].
- Other motion vectors such as mvp_l0[0:L] can be determined together in a similar fashion. This concept can also be extended to multiple lists (>2).
- motion vector prediction of a given pixel block may be predicted from other pixel blocks that are neighbours of the pixel block being coded.
- a bi-predictive or multihypothesis candidate may be based on pixel blocks in a spatial neighbourhood of the block being coded, which may be represented an index j into a merge_idx list. This candidate j is associated with motion vectors and references for all its lists, e.g. L0 and L1.
- a coder may introduce an additional candidate, (called j+1), which uses the same predictor for L0, but for L1 only the reference index is reused.
- an encoder may scale its motion vectors based on relative temporal displacement between the L1 and the L0 candidates.
- an encoder may select a first, unipredicted candidate (e.g. L0), for skip or merge, which has a consequence that no neighbours would provide candidates for another list.
- a first, unipredicted candidate e.g. L0
- an encoder may indicate, using merge_idx, a derived candidate for another list (say, L1) given its L0 candidate, which enables biprediction or multihypothesis prediction using another merge_idx.
- the foregoing techniques may be extended to coding of Overlapped Block Motion Compensation with multihypothesis prediction as well as affine block motion compensation, for coding systems that support those features.
- Video decoders and/or controllers can be embodied in integrated circuits, such as application specific integrated circuits, field programmable gate arrays, and/or digital signal processors. Alternatively, they can be embodied in computer programs that execute on camera devices, personal computers, notebook computers, tablet computers, smartphones, or computer servers. Such computer programs typically are stored in physical storage media such as electronic-, magnetic-, and/or optically-based storage devices, where they are read to a processor and executed.
- Decoders commonly are packaged in consumer electronics devices, such as smartphones, tablet computers, gaming systems, DVD players, portable media players and the like; and they also can be packaged in consumer software applications such as video games, media players, media editors, and the like. And, of course, these components may be provided as hybrid systems that distribute functionality across dedicated hardware components and programmed general-purpose processors, as desired.
- Video coders and decoders may exchange video through channels in a variety of ways. They may communicate with each other via communication and/or computer networks as illustrated in FIG. 1 . In still other applications, video coders may output video data to storage devices, such as electrical, magnetic and/or optical storage media, which may be provided to decoders sometime later. In such applications, the decoders may retrieve the coded video data from the storage devices and decode it.
- storage devices such as electrical, magnetic and/or optical storage media
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The present disclosure describes techniques for coding and decoding video in which a plurality of coding hypotheses are developed for an input pixel block of frame content. Each coding hypothesis may include generation of prediction data for the input pixel block according to a respective prediction search. The input pixel block may be coded with reference to a prediction block formed from prediction data derived according to plurality of hypotheses. Data of the coded pixel block may be transmitted to a decoder along with data identifying a number of the hypotheses used during the coding to a channel. At a decoder, an inverse process may be performed, which may include generation of a counterpart prediction block from prediction data derived according to the hypothesis identified with the coded pixel block data, then decoding of the coded pixel block according to the prediction data.
Description
- The present application benefits from priority conferred by application Ser. No. 62/625,547, entitled “Techniques of Multi-Hypothesis Motion Compensation” and filed on Feb. 2, 2018, as well as application Ser. No. 62/626,276, also entitled “Techniques of Multi-Hypothesis Motion Compensation” and filed on Feb. 5, 2018, the disclosures of which are incorporated herein in their entirety.
- The present disclosure relates to video coding and, in particular, to video coding techniques that employ multi-hypothesis motion compensation coding (also, “MHMC” coding).
- Modern video compression systems, such as MPEG-4 AVC, HEVC, VP9, VP8, and AV1, often employ block-based strategies for video compression. In particular, a rectangular video region, (e.g. an image, a tile, and rows within such regions), are partitioned into rectangular or square blocks (called “pixel blocks” for convenience), and, for each pixel block, a different prediction mode is specified.
- MHMC is an inter prediction method where prediction of pixel blocks within a region such as a frame, tile, or slice, may be coded several different ways (called “hypotheses”) and the hypotheses are combined together. Thus, a prediction block ŝ for an input pixel block s can be generated using a set of N hypotheses hi as follows:
-
- Each hypothesis commonly is associated with a reference index, motion vector parameters, and, in some instances, illumination compensation (weighted prediction) parameters. Motion vectors indicate displacement information, i.e. the displacement of the current block in relationship to the reference frame identified for their respective hypotheses. Commonly, such displacement is limited to only translational information. Current MHMC coding systems merge hypotheses via linear models as shown in Eq. 1.
- Modern codecs, such as AVC and HEVC, do not force the number of hypotheses for a block in a multihypothesis-predicted frame. Instead, a block can use a single or more hypotheses, or even be predicted using intra prediction. Currently, the number of maximum hypotheses that could be used is limited to two. This is mostly due to the cost of signalling any associated parameters relating to the prediction process but also due to complexity and bandwidth of reading data and reconstructing the final prediction. At lower bitrates, for example, mode and motion information can dominate the overall bitrate of an encoded stream. On the other hand, increasing the number of hypotheses even slightly, i.e. from two to three could result, in considerable, e.g. a 33%, increase in bandwidth in a system, and therefore impact power consumption. Data prefetching and memory management can become more complex and costly. Current coding protocols do not always provide efficient mechanisms to communicate video coding parameters between encoders and decoders.
- For example, in many video coding systems, the indication of prediction lists to be used in B slices is performed using a single parameter, which is sent at the prediction unit level of the coding syntax. Such a parameter may indicate whether a block will be using the List0 prediction, List1 prediction, or Bi-prediction. This type of signalling may also include indication of other coding modes as well, such as SKIP and DIRECT modes, Intra, etc. This method, therefore, incurs cost in terms of the signalling overhead to communicate these parameters and the cost can be expected to increase if MHMC coding techniques were employed that expand the number of hypothesis to levels greater than two. Similarly the overhead cost of signalling other coding parameters, such as prediction mode and motion vectors, would increase if MHMC coding technique expand the number of hypotheses to levels greater than two.
- Conventionally, modern MHMC coding systems are designed to use exactly the same partitioning and are not permitted to change within the said partitioning. That is, if a block of size 16×16 is said to be utilizing bi-prediction, then a single set of parameters (e.g., reference indices, motion vectors, and weighted parameters), for each hypothesis is signalled and used.
- The inventors perceive a need for techniques to improve MHMC coding operations that provide greater flexibility to coding system to represent image data in coding.
-
FIG. 1 illustrates a simplified block diagram of avideo delivery system 100 according to an aspect of the present disclosure. -
FIG. 2 is a functional block diagram illustrating components of an encoding terminal. -
FIG. 3 is a functional block diagram illustrating components of a decoding terminal according to an aspect of the present disclosure. -
FIG. 4 illustrates exemplary application of MHMC coding. -
FIGS. 5(a) and 5(b) , respectively, illustrate exemplary operation of multi-hypothesis motion compensation coding according to an aspect of the present disclosure. -
FIG. 6 is a functional block diagram of a coding system according to an aspect of the present disclosure. -
FIG. 7 is a functional block diagram of a decoding system according to an aspect of the present disclosure. - Aspects of the present disclosure provide techniques for coding and decoding video in which a plurality of coding hypotheses are developed for an input pixel block of frame content. Each coding hypothesis may include generation of prediction data for the input pixel block according to a respective prediction search. The input pixel block may be coded with reference to a prediction block formed from prediction data derived according to plurality of hypotheses. Data of the coded pixel block may be transmitted to a decoder along with data identifying a number of the hypotheses used during the coding to a channel. At a decoder, an inverse process may be performed, which may include generation of a counterpart prediction block from prediction data derived according to the hypothesis identified with the coded pixel block data, then decoding of the coded pixel block according to the prediction data.
-
FIG. 1 illustrates a simplified block diagram of avideo delivery system 100 according to an aspect of the present disclosure. Thesystem 100 may include a plurality ofterminals terminals first terminal 110 may capture video data locally, code the video data and transmit the coded video data to thecounterpart terminal 120 via a channel. Thereceiving terminal 120 may receive the coded video data, decode it, and render it locally, for example, on a display at theterminal 120. If the terminals are engaged in bidirectional exchange of video data, then theterminal 120 may capture video data locally, code the video data and transmit the coded video data to thecounterpart terminal 110 via another channel. Thereceiving terminal 110 may receive the coded video data transmitted fromterminal 120, decode it, and render it locally, for example, on its own display. The processes described can operate on both frame and field frame coding but, for simplicity, the present discussion will describe the techniques in the context of integral frames. - A
video coding system 100 may be used in a variety of applications. In a first application, theterminals terminal 110 may code pre-produced video (for example, television or movie programming) and store the coded video for delivery to one or, often, many downloading clients (e.g., terminal 120). Thus, the video being coded may be live or pre-produced, and theterminal 110 may act as a media server, delivering the coded video according to a one-to-one or a one-to-many distribution model. For the purposes of the present discussion, the type of video and the video distribution schemes are immaterial unless otherwise noted. - In
FIG. 1 , theterminals - The
network 130 represents any number of networks that convey coded video data between theterminals -
FIG. 2 is a functional block diagram illustrating components of an encoding terminal according to an aspect of the present disclosure. The encoding terminal may include avideo source 210, animage processor 220, acoding system 230, and atransmitter 240. Thevideo source 210 may supply video to be coded. Thevideo source 210 may be provided as a camera that captures image data of a local environment, a storage device that stores video from some other source or a network connection through which source video data is received. Theimage processor 220 may perform signal conditioning operations on the video to be coded to prepare the video data for coding. For example, thepreprocessor 220 alter the frame rate, frame resolution, and/or other properties of the source video. Theimage processor 220 also may perform filtering operations on the source video. - The
coding system 230 may perform coding operations on the video to reduce its bandwidth. Typically, thecoding system 230 exploits temporal and/or spatial redundancies within the source video. For example, thecoding system 230 may perform motion compensated predictive coding in which video frame or field frames are parsed into sub-units (called “pixel blocks,” for convenience), and individual pixel blocks are coded differentially with respect to predicted pixel blocks, which are derived from previously-coded video data. A given pixel block may be coded according to any one of a variety of predictive coding modes, such as: -
- intra-coding, in which an input pixel block is coded differentially with respect to previously coded/decoded data of a common frame;
- single prediction inter-coding, in which an input pixel block is coded differentially with respect to data of a previously coded/decoded frame; and
- multi-hypothesis motion compensation predictive coding, in which an input pixel block is coded predictively using decoded data from two or more sources, via temporal or spatial prediction.
The predictive coding modes may be used cooperatively with other coding techniques, such as Transform Skip coding, RRU coding, scaling of prediction sources, palette coding, and the like.
- The
coding system 230 may include aforward coder 232, adecoder 233, an in-loop filter 234, aframe buffer 235, and apredictor 236. Thecoder 232 may apply the differential coding techniques to the input pixel block using predicted pixel block data supplied by thepredictor 236. Thedecoder 233 may invert the differential coding techniques applied by thecoder 232 to a subset of coded frames designated as reference frames. The in-loop filter 234 may apply filtering techniques to the reconstructed reference frames generated by thedecoder 233. Theframe buffer 235 may store the reconstructed reference frames for use in prediction operations. Thepredictor 236 may predict data for input pixel blocks from within the reference frames stored in the frame buffer. - The
transmitter 240 may transmit coded video data to a decoding terminal via a channel CH. -
FIG. 3 is a functional block diagram illustrating components of a decoding terminal according to an aspect of the present disclosure. The decoding terminal may include areceiver 310 to receive coded video data from the channel, avideo decoding system 320 that decodes coded data, a post-processor 330, and avideo sink 340 that consumes the video data. - The
receiver 310 may receive a data stream from the network and may route components of the data stream to appropriate units within theterminal 300. AlthoughFIGS. 2 and 3 illustrate functional units for video coding and decoding,terminals 110, 120 (FIG. 1 ) often will include coding/decoding systems for audio data associated with the video and perhaps other processing units (not shown). Thus, thereceiver 310 may parse the coded video data from other elements of the data stream and route it to thevideo decoder 320. - The
video decoder 320 may perform decoding operations that invert coding operations performed by the coding system 140. The video decoder may include adecoder 322, an in-loop filter 324, aframe buffer 326, and apredictor 328. Thedecoder 322 may invert the differential coding techniques applied by the coder 142 to the coded frames. The in-loop filter 324 may apply filtering techniques to reconstructed frame data generated by thedecoder 322. For example, the in-loop filter 324 may perform various filtering operations (e.g., de-blocking, de-ringing filtering, sample adaptive offset processing, and the like). The filtered frame data may be output from the decoding system. Theframe buffer 326 may store reconstructed reference frames for use in prediction operations. Thepredictor 328 may predict data for input pixel blocks from within the reference frames stored by the frame buffer according to prediction reference data provided in the coded video data. - The post-processor 330 may perform operations to condition the reconstructed video data for display. For example, the post-processor 330 may perform various filtering operations (e.g., de-blocking, de-ringing filtering, and the like), which may obscure visual artifacts in output video that are generated by the coding/decoding process. The post-processor 330 also may alter resolution, frame rate, color space, etc. of the reconstructed video to conform it to requirements of the
video sink 340. - The
video sink 340 represents various hardware and/or software components in a decoding terminal that may consume the reconstructed video. Thevideo sink 340 typically may include one or more display devices on which reconstructed video may be rendered. Alternatively, thevideo sink 340 may be represented by a memory system that stores the reconstructed video for later use. Thevideo sink 340 also may include one or more application programs that process the reconstructed video data according to controls provided in the application program. In some aspects, the video sink may represent a transmission system that transmits the reconstructed video to a display on another device, separate from the decoding terminal; for example, reconstructed video generated by a notebook computer may be transmitted to a large flat panel display for viewing. - The foregoing discussion of the encoding terminal and the decoding terminal (
FIGS. 2 and 3 ) illustrates operations that are performed to code and decode video data in a single direction between terminals, such as fromterminal 110 to terminal 120 (FIG. 1 ). In applications where bidirectional exchange of video is to be performed between theterminals FIG. 2 ) and each terminal 110, 120 will possess the functional units associated with a decoding terminal (FIG. 3 ). Indeed, in certain applications,terminals FIG. 2 ) provided therein. Such implementations are fully consistent with the present discussion. -
FIG. 4 illustrates exemplary application of MHMC coding. InFIG. 4 , ablock 412 in thecurrent frame 410 may be coded with reference to N reference frames 420.1-420.n-1. Motion vectors mv1-mvn may be derived that reference prediction sources in the reference frames 420.1-420.n-1. The prediction sources may be combined as per Eq. 1 to yield a prediction block ŝ that serves as a predictor forblock 412. - In an aspect of the present disclosure, coders may develop coding hypotheses for an input block using different block sizes. One set of exemplary coding hypotheses is shown in
FIGS. 5(a) and 5(b) . InFIG. 5(a) , an input block 512 from aninput frame 510 may be coded using coding hypotheses that use different block sizes. A first hypothesis is represented by motion vector mv1, which codes the input block 512 with reference to ablock 522 in afirst reference frame 520; thereference block 522 has a size equal to the size of block 512. A second set of hypotheses is represented by motion vectors mv2.1-mv2.4. For the second set of hypotheses, the input block 512 is partitioned into a plurality of sub-blocks (here, four sub-blocks 514.1-514.4), each of which are predictively coded. For ease of illustration,FIG. 5(a) illustrates each of the sub-blocks 514.1-514.4 coded with reference to content from acommon reference frame 530, shown as reference blocks 532.1-532.4. In practice, each of the sub-blocks may be predicted independently of the other sub-blocks, which may cause prediction references to be selected from the same or different reference frames (not shown). -
FIG. 5(b) illustrates the relationship between the coding hypotheses and aprediction block ŝ 540 that will be used for coding and decoding. Content of the various hypotheses may be merged together to generate the prediction block. In the example ofFIG. 5(b) , theprediction block 540 may be generated from prediction data contributed byblock 522 of the first hypothesis and prediction data contributed by the sub-blocks 532.1-532.4, which may be weighted according to respective weight factors as shown in Eq. 1. - Although
FIGS. 5(a) and 5(b) illustrate coding by two hypotheses (the second hypothesis being formed from predictions of four sub-blocks), in principle, MHMC coding may be extended to a greater number of hypotheses. Moreover, althoughFIG. 5(b) illustrates a linear weighting model, non-linear weighting models also may be used. -
FIG. 6 is a functional block diagram of acoding system 600 according to an aspect of the present disclosure. Thesystem 600 may include apixel block coder 610, apixel block decoder 620, aframe buffer 630, an in-loop filter system 640, areference frame store 650, apredictor 660, acontroller 670, and asyntax unit 680. Thepredictor 660 may develop the different hypotheses for use during coding of a newly-presented input pixel block s and it may supply a prediction block ŝ to thepixel block coder 610. Thepixel block coder 610 may code the new pixel block by predictive coding techniques and present coded pixel block data to thesyntax unit 680. Thepixel block decoder 620 may decode the coded pixel block data, generating decoded pixel block data therefrom. Theframe buffer 630 may generate reconstructed frame data from the decoded pixel block data. The in-loop filter 640 may perform one or more filtering operations on the reconstructed frame. For example, the in-loop filter 640 may perform deblocking filtering, sample adaptive offset (SAO) filtering, adaptive loop filtering (ALF), maximum likelihood (ML) based filtering schemes, deringing, debanding, sharpening, resolution scaling, and the like. Thereference frame store 650 may store the filtered frame, where it may be used as a source of prediction of later-received pixel blocks. Thesyntax unit 680 may assemble a data stream from the coded pixel block data, which conforms to a governing coding protocol. - In MHMC coding, the
predictor 660 may select the different hypotheses from among the different candidate prediction modes that are available under a governing coding syntax. Thepredictor 660 may decide, for example, the number of hypotheses that may be used, the prediction sources for those hypotheses and, in certain aspects, partitioning sizes at which the predictions will be performed. For example, thepredictor 660 may decide whether a given input pixel block will be coded using a prediction block that matches the sizes of the input pixel block or whether it will be coded using prediction blocks at smaller sizes. Thepredictor 660 also may decide, for some smaller-size partitions of the input block, that SKIP coding will be applied to one or more of the partitions (called “null” coding herein). - The
pixel block coder 610 may include asubtractor 612, atransform unit 614, aquantizer 616, and anentropy coder 618. Thepixel block coder 610 may accept pixel blocks of input data at thesubtractor 612. Thesubtractor 612 may receive predicted pixel blocks from thepredictor 660 and generate an array of pixel residuals therefrom representing a difference between the input pixel block and the predicted pixel block. Thetransform unit 614 may apply a transform to the sample data output from thesubtractor 612, to convert data from the pixel domain to a domain of transform coefficients. Thequantizer 616 may perform quantization of transform coefficients output by thetransform unit 614. Thequantizer 616 may be a uniform or a non-uniform quantizer. Theentropy coder 618 may reduce bandwidth of the output of the coefficient quantizer by coding the output, for example, by variable length code words or using a context adaptive binary arithmetic coder. - The
transform unit 614 may operate in a variety of transform modes as determined by thecontroller 670. For example, thetransform unit 614 may apply a discrete cosine transform (DCT), a discrete sine transform (DST), a Walsh-Hadamard transform, a Haar transform, a Daubechies wavelet transform, or the like. In an aspect, thecontroller 670 may select a coding mode M to be applied by the transform unit 615, may configure the transform unit 615 accordingly and may signal the coding mode M in the coded video data, either expressly or impliedly. - The
quantizer 616 may operate according to a quantization parameter QP that is supplied by thecontroller 670. In an aspect, the quantization parameter QP may be applied to the transform coefficients as a multi-value quantization parameter, which may vary, for example, across different coefficient locations within a transform-domain pixel block. Thus, the quantization parameter QP may be provided as a quantization parameters array. - The
entropy coder 618, as its name implies, may perform entropy coding of data output from thequantizer 616. For example, theentropy coder 618 may perform run length coding, Huffman coding, Golomb coding, Context Adaptive Binary Arithmetic Coding, and the like. - The
pixel block decoder 620 may invert coding operations of thepixel block coder 610. For example, thepixel block decoder 620 may include adequantizer 622, aninverse transform unit 624, and anadder 626. Thepixel block decoder 620 may take its input data from an output of thequantizer 616. Although permissible, thepixel block decoder 620 need not perform entropy decoding of entropy-coded data since entropy coding is a lossless event. Thedequantizer 622 may invert operations of thequantizer 616 of thepixel block coder 610. Thedequantizer 622 may perform uniform or non-uniform de-quantization as specified by the decoded signal QP. Similarly, theinverse transform unit 624 may invert operations of thetransform unit 614. Thedequantizer 622 and theinverse transform unit 624 may use the same quantization parameters QP and transform mode M as their counterparts in thepixel block coder 610. Quantization operations likely will truncate data in various respects and, therefore, data recovered by thedequantizer 622 likely will possess coding errors when compared to the data presented to thequantizer 616 in thepixel block coder 610. - The
adder 626 may invert operations performed by thesubtractor 612. It may receive the same prediction pixel block from thepredictor 660 that thesubtractor 612 used in generating residual signals. Theadder 626 may add the prediction pixel block to reconstructed residual values output by theinverse transform unit 624 and may output reconstructed pixel block data. - As described, the
frame buffer 630 may assemble a reconstructed frame from the output of thepixel block decoders 620. The in-loop filter 640 may perform various filtering operations on recovered pixel block data. For example, the in-loop filter 640 may include a deblocking filter, a sample adaptive offset (“SAO”) filter, and/or other types of in loop filters (not shown). - The
reference frame store 650 may store filtered frame data for use in later prediction of other pixel blocks. Different types of prediction data are made available to thepredictor 660 for different prediction modes. For example, for an input pixel block, intra prediction takes a prediction reference from decoded data of the same frame in which the input pixel block is located. Thus, thereference frame store 650 may store decoded pixel block data of each frame as it is coded. For the same input pixel block, inter prediction may take a prediction reference from previously coded and decoded frame(s) that are designated as reference frames. Thus, thereference frame store 650 may store these decoded reference frames. - As discussed, the
predictor 660 may supply prediction blocks ŝ to thepixel block coder 610 for use in generating residuals. Thepredictor 660 may include, for each of a plurality of hypotheses 661.1-661.n, aninter predictor 662, anintra predictor 663, and amode decision unit 662. The different hypotheses 661.1-661.n may operate at different partition sizes as described above. For each hypothesis, theinter predictor 662 may receive pixel block data representing a new pixel block to be coded and may search reference frame data fromstore 650 for pixel block data from reference frame(s) for use in coding the input pixel block. The inter-predictor 662 may perform its searches at the partition sizes of the respective hypothesis. Thus, when searching at smaller partition sizes, the inter-predictor 662 may perform multiple searches, one using each of the sub-partitions at work for its respective hypothesis. Theinter predictor 662 may select prediction reference data that provides a closest match to the input pixel block being coded. Theinter predictor 662 may generate prediction reference metadata, such as prediction block size and motion vectors, to identify which portion(s) of which reference frames were selected as source(s) of prediction for the input pixel block. - The
intra predictor 663 may support Intra (I) mode coding. Theintra predictor 663 may search from among pixel block data from the same frame as the pixel block being coded that provides a closest match to the input pixel block. Theintra predictor 663 also may run searches at the partition size for its respective hypothesis and, when sub-partitions are employed, separate searches may be run for each sub-partition. Theintra predictor 663 also may generate prediction mode indicators to identify which portion of the frame was selected as a source of prediction for the input pixel block. - The
mode decision unit 664 may select a final coding mode for the hypothesis from the output of the inter-predictor 662 and the inter-predictor 663. Themode decision unit 664 may output prediction data and the coding parameters (e.g., selection of reference frames, motion vectors and the like) for the mode selected for the respective hypothesis. Typically, as described above, themode decision unit 664 will select a mode that achieves the lowest distortion when video is decoded given a target bitrate. Exceptions may arise when coding modes are selected to satisfy other policies to which thecoding system 600 adheres, such as satisfying a particular channel behavior, or supporting random access or data refresh policies. - Prediction data output from the
mode decision units 664 of the different hypotheses 661.1-661.N may be input to a predictionblock synthesis unit 665, which merges the prediction data into an aggregate prediction block ŝ. As described, the prediction block ŝ may be formed from a linear combination of the predictions from the individual hypotheses, for example, as set forth in Eq. 1, or non-linear combinations may be performed. The predictionblock synthesis unit 665 may supply the prediction block ŝ to thepixel block coder 610. Thepredictor 660 may output to thecontroller 670 parameters representing coding decisions for each hypothesis. - The
controller 670 may control overall operation of thecoding system 600. Thecontroller 670 may select operational parameters for thepixel block coder 610 and thepredictor 660 based on analyses of input pixel blocks and also external constraints, such as coding bitrate targets and other operational parameters. As is relevant to the present discussion, when it selects quantization parameters QP, the use of uniform or non-uniform quantizers, and/or the transform mode M, it may provide those parameters to thesyntax unit 680, which may include data representing those parameters in the data stream of coded video data output by thesystem 600. Thecontroller 670 also may select between different modes of operation by which the system may generate reference images and may include metadata identifying the modes selected for each portion of coded data. - During operation, the
controller 670 may revise operational parameters of thequantizer 616 and the transform unit 615 at different granularities of image data, either on a per pixel block basis or on a larger granularity (for example, per frame, per slice, per largest coding unit (“LCU”) or Coding Tree Unit (CTU), or another region). In an aspect, the quantization parameters may be revised on a per-pixel basis within a coded frame. - Additionally, as discussed, the
controller 670 may control operation of the in-loop filter 640 and theprediction unit 660. Such control may include, for theprediction unit 660, mode selection (lambda, modes to be tested, search windows, distortion strategies, etc.), and, for the in-loop filter 640, selection of filter parameters, reordering parameters, weighted prediction, etc. -
FIG. 7 is a functional block diagram of adecoding system 700 according to an aspect of the present disclosure. Thedecoding system 700 may include asyntax unit 710, apixel block decoder 720, an in-loop filter 740, areference frame store 750, apredictor 760, and acontroller 770. As with the encoder (FIG. 6 ), thepixel block decoder 720 andpredictor 760 may be instantiated for each of the hypotheses identified by the coded video data. - The
syntax unit 710 may receive a coded video data stream and may parse the coded data into its constituent parts. Data representing coding parameters may be furnished to thecontroller 770, while data representing coded residuals (the data output by thepixel block coder 610 ofFIG. 6 ) may be furnished to its respectivepixel block decoder 720. Thepredictor 760 may generate a prediction block ŝ from reference data available in thereference frame store 750 according to coding parameter data provided in the coded video data. It may supply the prediction block ŝ to the pixel block decoder. Thepixel block decoder 720 may invert coding operations applied by the pixel block coder 610 (FIG. 6 ). Theframe buffer 730 may create a reconstructed frame from decoded pixel blocks s′ output by thepixel block decoder 720. The in-loop filter 740 may filter the reconstructed frame data. The filtered frames may be output from thedecoding system 700. Filtered frames that are designated to serve as reference frames also may be stored in thereference frame store 750. - The
pixel block decoder 720 may include anentropy decoder 722, adequantizer 724, aninverse transform unit 726, and anadder 728. Theentropy decoder 722 may perform entropy decoding to invert processes performed by the entropy coder 618 (FIG. 6 ). Thedequantizer 724 may invert operations of the quantizer 716 of the pixel block coder 610 (FIG. 6 ). Similarly, theinverse transform unit 726 may invert operations of the transform unit 614 (FIG. 6 ). They may use the quantization parameters QP and transform modes M that are provided in the coded video data stream. Because quantization is likely to truncate data, the pixel blocks s′ recovered by thedequantizer 724, likely will possess coding errors when compared to the input pixel blocks s presented to thepixel block coder 610 of the encoder (FIG. 6 ). - The
adder 728 may invert operations performed by the subtractor 610 (FIG. 6 ). It may receive a prediction pixel block from thepredictor 760 as determined by prediction references in the coded video data stream. Theadder 728 may add the prediction pixel block to reconstructed residual values output by theinverse transform unit 726 and may output reconstructed pixel block data. - As described, the
frame buffer 730 may assemble a reconstructed frame from the output of thepixel block decoder 720. The in-loop filter 740 may perform various filtering operations on recovered pixel block data as identified by the coded video data. For example, the in-loop filter 740 may include a deblocking filter, a sample adaptive offset (“SAO”) filter, and/or other types of in loop filters. In this manner, operation of theframe buffer 730 and the inloop filter 740 mimics operation of thecounterpart frame buffer 630 and inloop filter 640 of the encoder 600 (FIG. 6 ). - The
reference frame store 750 may store filtered frame data for use in later prediction of other pixel blocks. Thereference frame store 750 may store decoded frames as it is coded for use in intra prediction. Thereference frame store 750 also may store decoded reference frames. - As discussed, the
predictor 760 may supply the prediction blocks ŝ to thepixel block decoder 720. Thepredictor 760 may retrieve prediction data from thereference frame store 750 for each of the hypotheses represented in the coded video data (represented by hypothesis predictors 762.1-762.n). A predictionblock synthesis unit 764 may generate an aggregate prediction block ŝ from the prediction data of the different hypothesis. In this manner, the predictionblock synthesis unit 764 may replicate operations of thesynthesis unit 665 from the encoder (FIG. 6 ). Thepredictor 760 may supply predicted pixel block data as determined by the prediction reference indicators supplied in the coded video data stream. - The
controller 770 may control overall operation of thecoding system 700. Thecontroller 770 may set operational parameters for thepixel block decoder 720 and thepredictor 760 based on parameters received in the coded video data stream. As is relevant to the present discussion, these operational parameters may include quantization parameters QP for the dequantizer 724 and transform modes M for theinverse transform unit 710. As discussed, the received parameters may be set at various granularities of image data, for example, on a per pixel block basis, a per frame basis, a per slice basis, a per LCU/CTU basis, or based on other types of regions defined for the input image. - In an aspect, encoders may provide signalling in a coded bit stream to identify the number of hypotheses that are used to code frame data for each such hypothesis. For example, when frame data is to be coded by MHMC, coded video data may identify a number of hypotheses using a count value. When a given hypothesis can be coded as prediction blocks of different sizes, coded video data may contain data identifying such prediction block sizes.
- An exemplary syntax for such identifications is provided in Table 1 below:
-
TABLE 1 if(type == MH ) { number_of_hypotheses_minus2 ue(v) for( i = 0; i < number_of lists_minus2 + 2; i++) { list_prediction_implicit_present_flag[ i ] u(1) log2_min_luma_hypothesis_block_size_minus2 [ i ] ue(v) log2_diff_max_min_luma_hypothesis_block_size [ i ] ue(v) } }
Here, type corresponds to the slice type. If MH, then that slice is indicated as a multihypothesis slice that permits 2 or more hypotheses to be used for prediction. The field number_of_hypotheses_minus2 may identify the number of hypotheses available for coding. The field list_prediction_implicit_present_flag[i] enforces that a hypothesis from list i is always present for all CTUs or at least for all inter CTUs in the current slice. The fields log2_min_luma_hypothesis_block_size_minus2[i] and log2_diff_max_min_luma_hypothesis_block_size[i] respectively may identify minimum and maximum sizes of prediction blocks that are available for each such hypothesis. Providing minimum and maximum sizes for the prediction blocks may constrain coding complexity by avoiding MHMC combinations outside the indicated size ranges. - The foregoing example permits use of multiple hypotheses for block sizes starting from ML×NL to MH×NH, where ML, NL, MH, NH are defined, respectively, by the log2_min_luma_hypothesis_block_size_minus2[i] and log2_diff_max_min_luma_hypothesis_block_size[i] fields. For example, if the minimum size for a hypothesis from list i was defined as 16 and the maximum size was defined as 64, then this syntax would permit prediction block sizes from list i with block sizes of, for example, 16×16, 32×32, and 64×64 pixels. If rectangular partitions were also permitted for prediction, these would also be limited within the specified resolutions. Of course, other variants would be supported by the syntax of Table 1. For example, prediction block sizes may be defined starting from a size 32×32 and above.
- The syntax identified in Table 1 may be provided at different levels of coded video data, such as at the sequence-, picture-, slice-level, a segment, or for a group of CTUs. Table 12 below illustrates application of the syntax to an exemplary HEVC-style slice syntactic element. Moreover, the syntax elements may be assigned to these levels dynamically based on coding decision performed by an encoder. In this manner, the syntax increases flexibility in the encoder and permits the encoder to conserve signalling overhead when the syntax elements may be provided in higher-level elements of the coding protocol.
- Table 2, for example, provides an example in which slice segment header syntax is extended to include parameters that indicate for which block sizes a particular hypothesis will be valid or not. A single parameter is identified for the minimum width and height, i.e. log2_min_luma_hypothesis_block_size_minus2[i] as well as for the maximum, i.e. log2_diff_max_min_luma_hypothesis_block_size_minus2[i]. The list_prediction_implicit_present_flag[i] discussed above is provided as well. Independent parameters for the width and height could also be used. Additional elements that are impacted by the use of Multihypothesis prediction instead of biprediction, such as the number of references for each list (num_ref_idx_active_minus1_list[i]), the use of zero motion prediction (mvd_list_zero_flag[i]), and from which list the collocated temporal motion vector will be derived.
-
TABLE 2 Exemplary Slice Segment Header to support Multihypothesis prediction Descriptor slice_segment_header( ) { first_slice_segment_in_pic_flag u(1) if( nal_unit_type >= BLA_W_LP && nal_unit_type <= RSV_IRAP_VCL23 ) no_output_of_prior_pics_flag u(1) slice_pic_parameter_set_id ue(v) if( !first_slice_segment_in_pic_flag ) { if( dependent_slice_segments_enabled_flag ) dependent_slice_segment_flag u(1) slice_segment_address u(v) } if( !dependent_slice_segment_flag ) { for( i = 0; i < num_extra_slice_header_bits; i++ ) slice_reserved_flag[ i ] u(1) slice_type ue(v) if( slice_type == MH) { number_of_lists_minus2 ue(v) for( i = 0; i < number_of lists_minus2 + 2; i++ ) { list_prediction_implicit_present_flag[ i ] u(1) log2_min_luma_hypothesis_block_size_minus2 [ i ] ue(v) log2_diff_max_min_luma_hypothesis_block_size [ i ] ue(v) } } else { /* slice_type == P */ number_of_lists_minus2 = −1 list_prediction_implicit_present_flag[ 0 ] = 1 } if( output_flag_present_flag ) pic_output_flag u(1) if( separate_colour_plane_flag = = 1) colour_plane_id u(2) if( nal_unit_type != IDR_W_RADL && nal_unit_type != IDR_N_LP ) { slice_pic_order_cnt_lsb u(v) short_term_ref_pic_set_sps_flag u(1) if( !short_term_ref_pic_set_sps_flag ) st_ref_pic_set( num_short_term_ref_pic_sets ) else if( num_short_term_ref_pic_sets > 1 ) short_term_ref_pic_set_idx u(v) if( long_term_ref_pics_present_flag ) { if( num_long_term_ref_pics_sps > 0 ) num_long_term_sps ue(v) num_long_term_pics ue(v) for( i = 0; i < num_long_term_sps + num_long_term_pics; i++ ) { if( i < num_long_term_sps ) { if( num_long_term_ref_pics_sps > 1) lt_idx_sps[ i ] u(v) } else { poc_lsb_lt[ i ] u(v) used_by_curr_pic_lt_flag[ i ] u(1) } delta_poc_msb_present_flag[ i ] u(1) if( delta_poc_msb_present_flag[ i ] ) delta_poc_msb_cycle_lt[ i ] ue(v) } } if( sps_temporal_mvp_enabled_flag ) slice_temporal_mvp_enabled_flag u(1) } if( sample_adaptive_offset_enabled_flag ) { slice_sao_luma_flag u(1) if( ChromaArrayType != 0 ) slice_sao_chroma_flag u(1) } if( slice_type = = P | | slice_type = = MH ) { num_ref_idx_active_override_flag u(1) if( num_ref_idx_active_override_flag ) { for( i = 0; i < number_of lists_minus2 + 2; i++ ) num_ref_idx_ active_minus1_list[ i ] ue(v) } if( lists_modification_present_flag && NumPicTotalCurr > 1 ) ref pic_lists_modification( ) mvd_list_zero_flag[ 0 ] = 0 for( i = 1; i < number_of_lists_minus2 + 2; i++ ) mvd_list_zero_flag[ i ] u(1) if( cabac_init_present_flag ) cabac_init_flag u(1) if( slice_temporal_mvp_enabled_flag ) { if( slice_type = = MH) collocated_from_list ue(v) else collocated_from_list = 0 ue(v) if( (num_ref_idx_active_minus1_list[ collocated_from_list ] > 0) collocated_ref_idx ue(v) } if( ( weighted_pred_flag && slice_type = = P) | | ( weighted_mh_pred_flag && slice_type = = MH ) ) pred_weight_table( ) five_minus_max_num_merge_cand ue(v) if( motion_vector_resolution_control_idc = = 2) use_integer_mv_flag u(1) } slice_qp_delta se(v) if( pps_slice_chroma_qp_offsets_present_flag ) { slice_cb_qp_offset se(v) slice_cr_qp_offset se(v) } if( pps_slice_act_qp_offsets_present_flag ) { slice_act_y_qp_offset se(v) slice_act_cb_qp_offset se(v) slice_act_cr_qp_offset se(v) } if( chroma_qp_offset_list_enabled_flag ) cu_chroma_qp_offset_enabled_flag u(1) if( deblocking_filter_override_enabled_flag ) deblocking_filter_override_flag u(1) if( deblocking_filter_override_flag ) { slice_deblocking_filter_disabled_flag u(1) if( !slice_deblocking_filter_disabled_flag ) { slice_beta_offset_div2 se(v) slice_tc_offset_div2 se(v) } } if( pps_loop_filter_across_slices_enabled_flag && ( slice_sao_luma_flag | | slice_sao_chroma_flag | | !slice_deblocking_filter_disabled_flag ) ) slice_loop_filter_across_slices_enabled_flag u(1) } if( tiles_enabled_flag | | entropy_coding_sync_enabled_flag ) { num_entry_point_offsets ue(v) if( num_entry_point_offsets > 0) { offset_len_minus1 ue(v) for( i = 0; i < num_entry_point_offsets; i++ ) entry_point_offset_minus1[ i ] u(v) } } if( slice_segment_header_extension_present_flag ) { slice_segment_header_extension_length ue(v) for( i = 0; i < slice_segment_header_extension_length; i++) slice_segment_header_extension_data_byte[ i ] u(8) } byte_alignment( ) } - The foregoing syntax table also provides examples of other coding parameters that may be included. The num_ref_idx_active_minus1_list[i] parameter may specify a maximum reference index for a respective reference picture list that may be used to decode a slice. The collocated from list parameter, for example, may identify a list from which temporal motion vector prediction is derived. The collocated_ref_idx parameter may identify a reference index of the collocated picture used for temporal motion vector prediction.
- MHMC coding techniques may be represented in coding of a pixel block using a syntax as shown, for example, in Table 3. In this example, prediction units may be defined for each prediction block in use, and their sizes may be identified using the nPbW, nPbH values. Unlike HEVC, which limits prediction to up to two lists, and where the prediction type is indicated with a mode parameter, here the use of a particular hypothesis is indicated through a list specific flag, i.e. list_pred_idc_flag[i][x0][y0]. This flag is either derived based on syntax components signalled earlier in higher levels or based on the utilization of other lists, or explicitly signalled in the bit stream. If this flag is one then that hypothesis will be used in combination with other enabled hypotheses, but if it is set to zero then a hypothesis from that list will not be considered. Depending on its value other parameters, such as reference indices motion vectors, but also other information that may be signalled at this level, such as illumination compensation parameters, can then be signalled.
-
TABLE 3 HEVC-style Prediction Unit Syntax of HEVC To Accommodate MHMC Descriptor prediction_unit( x0, y0, nPbW, nPbH ) { if( cu_skip_flag[ x0 ][ y0 ] ) { if( MaxNumMergeCand > 1) merge_idx[ x0 ][ y0] ae(v) } else {/* MODE_INTER */ merge_flag[ x0 ][ y0 ] ae(v) if( merge_flag[ x0 ][ y0 ] ) { if( MaxNumMergeCand > 1 ) merge_idx[ x0 ][ y0 ] ae(v) } else { set_lists = 0 for( i = 0; i < number_of_lists_minus2 + 2; i++ ) { min_luma_block = (log2_min_luma_hypothesis_block_size_minus2 [ i ] + 2) max_luma_block = min_luma_block + log2_diff_max_min_luma_hypothesis_block_size [ i ] if( (nPbW < (1 << min_luma_block)) ∥ (nPbH < (1 << min_luma_block)) ∥ (nPbW > (1 << max_luma_block)) ∥ (nPbH > (1 << max_luma_block))) list_pred_idc_flag[ i ][ x0 ][ y0 ] = 0 else if( !list_prediction_implicit_present_flag [ i ] && ( ( i < number_of_lists_minus2 + 1) | | set_lists > 0 ) list_pred_idc_flag[ i ] [ x0 ][ y0] ae(v) else list_pred_idc_flag[ i ] [ x0 ][ y0 ] = 1 set_lists += list_pred_idc_flag[ i ][ x0 ][ y0 ] } for( i = 0; i < number_of_lists_minus2 + 2; i++ ) { if( list_pred_idc_flag[ i ] [ x0 ][ y0 ] ) { if( num_ref_idx_active_minus1_list[ i ] > 0 ) ref_idx_list[ i ][ x0 ][ y0 ] ae(v) if( mvd_list_zero_flag[ i ] && set_lists > 1 ) { MvdList[ i ][ x0 ][ y0 ][ 0 ] = 0 MvdList[ i ] [ x0 ][ y0 ][ 1 ] = 0 } else mvd_coding( x0, y0, i ) mvp_list_flag[ i ][ x0 ][ y0 ] ae(v) } } } } } - In this example, the field list_pred_idc_flag[i][x0][y0] indicates whether a list corresponding to hypothesis i is to be used for coding, ref_idx_list[i][x0][y0] indicates an index into the respective list i identifying a reference frame that was used for prediction, and mvp_list_flag[i][x0][y0] indicates a motion vector predictor index of list i.
- In another aspect, a coding syntax may impose a restriction on the number of “minimum” references that can be combined together. For example, the syntax may restrict any hypotheses from list index i, with i>1, to be used only when a list with a lower index also is used or, alternatively, if all lower index lists are present. All of these restrictions could also be supported or defined by either implicitly imposing them for all bit streams or by signalling the restrictions for multihypothesis prediction in a higher syntax level. As discussed below, aspects of the present disclosure accommodate such restrictions as multihypothesis modes.
- For example, an encoder may signal a parameter mh_mode in the sequence, picture, or slice header, which will indicate the mode of/restrictions imposed on multihypothesis prediction. For example, when or if mh_mode==0, no further restrictions are performed. If mh_mode==1, then an encoder may signal lists with index i>1, only and only if all other lists with index j<i are also used. Otherwise, those are implicitly set to not present. Other modes could be also defined in addition or in place of the above modes.
- In such a case, coding syntax for the prediction unit may be represented as follows:
-
TABLE 4 HEVC-style Alternative Prediction Unit Syntax Modified To Accommodate MHMC Descriptor prediction_unit( x0, y0, nPbW, nPbH ) { if( cu_skip_flag[ x0 ][ y0 ] ) { if( MaxNumMergeCand > 1 ) merge_idx[ x0 ][ y0 ] ae(v) } else { /* MODE_INTER */ merge_flag[ x0 ][ y0 ] ae(v) if( merge_flag[ x0 ][ y0 ] ) { if( MaxNumMergeCand > 1) merge_idx[ x0 ][ y0 ] ae(v) } else { set_lists = 0 for( i = 0; i < number_of_lists_minus2 + 2; i++ ) { min_luma_block = (log2_min_luma_hypothesis_block_size_minus2 [ i ] + 2) max_luma_block = min_luma_block + log2_diff_max_min_luma_hypothesis_block_size [ i ] if( (nPbW < (1 << min_luma_block)) ∥ (nPbH < (1 << min_luma_block)) ∥ (nPbW > (1 << max_luma_block)) ∥ (nPbH > (1 << max_luma_block)) ∥ (mh_mode == 1 && i > 1 && set_lists < i)) list_pred_idc_flag[ i ][ x0 ][ y0 ] = 0 else if (mh_mode == 1 && i == 1 && set_lists == 0) list_pred_idc_flag[ i ] [ x0 ][ y0 ] = 1 else if( !list_prediction_implicit_present_flag [ i ] && ( ( i < number_of_lists_minus2 + 1) | | set_lists > 0) list_pred_idc_flag[ i ][ x0 ][ y0 ] ae(v) else list_pred_idc_flag[ i ] [ x0 ][ y0 ] = 1 set_lists += list_pred_idc_flag[ i ] [ x0 ][ y0 ] } for( i = 0; i < number_of_lists_minus2 + 2; i++ ) { if( list_pred_idc_flag[ i ] [ x0 ][ y0 ] ) { if( num_ref_idx_ active_minus1_list[ i ] > 0) ref_idx_list[ i ][ x0 ][ y0 ] ae(v) if( mvd_list_zero_flag[ i ] && set_lists > 1) { MvdList[ i ][x0 ][ y0 ][ 0 ] = 0 MvdList[ i ] [x0 ][ y0 ][ 1 ] = 0 } else mvd_coding( x0, y0, i ) mvp_list_flag[ i ][ x0 ][ y0 ] ae(v) } } } } } - In this example, the field list_pred_idc_flag[i][x0][y0] indicates whether a list corresponding to hypothesis i is to be used for coding, ref_idx_list[i][x0][y0] indicates an index into the respective list i identifying a reference frame that was used for prediction, and mvp_list_flag[i][x0][y0] indicates a motion vector predictor index of list i. Unlike the syntax presented in Table 3 it can be seen that mh_mode is now also considered to determine how and if multihypothesis prediction will be performed.
- As described, the foregoing techniques reduce the overhead for signalling coding mode for multihypothesis prediction, i.e. when to use and when not to use a certain hypothesis. To further reduce signalling overhead, coding syntax may employ additional techniques, as described below.
- In another aspect, motion parameters of a given hypothesis may be derived through relationship(s) with other hypotheses. For example, motion parameters or their differential values may be set to have a value of zero using the mvd_list_zero_flag parameter. Moreover, the syntax may be designed so it can specify any of the following cases for implicit derivation of motion parameters for a list:
- a. Enforce all motion parameters for a particular list index to be zero or a fixed vector (sent at a higher syntax layer, such as the sequence or picture parameter sets, or the slice header).
- b. Enforce either the vertical or horizontal motion vector component value of a particular list index to a fixed (zero or otherwise) value. Again such information could be signalled at a higher syntax layer.
- c. Set illumination compensation (weighted prediction) parameters to fixed values for all or for some of the lists (e.g. for lists with i>1). If explicit weighting parameters are present at the block level, the syntax may have explicit parameters only for the earlier lists (i<2) and, for all others, illumination parameters could be derived from the parameters of the earlier lists or fixed to default weighting parameters. Parameters could also be derived from previously-decoded neighbouring samples with corresponding increases to the complexity of the codec.
- d. Establish a mathematical relationship of the motion vectors of one list i with the motion vectors of another, earlier-in-presence list j (j<i). The relationship may also be related to the ref_idx and its associated picture order count or other parameter that may be available in the codec, and which parameter indicates a form of temporal distance relationships between references. Alternatively, the relationship may be based on signalled parameters in a higher level syntax, e.g. a linear model using parameters alpha (α) and beta (β). These parameters could also be vector values and have different elements for the horizontal and vertical motion vector elements. These parameters could then be used to weigh and offset the decoded motion parameters of list j. For example, a motion vector mvi may be set as
mvi =α×mvj +β. Other non-linear mathematical models may also be employed. - e. Mathematical relationships may be also defined by providing the relationships of a list with more than one other lists. In that case, the motion vector may be computed as
-
- assuming a linear model. Again, non linear models could also be used.
- f. Motion vectors for a list could be derived also through motion vector chaining. In particular, the syntax may represent motion parameters for a block at location (x,y) in frame i, using a reference j, if for another list a motion vector is provided pointing to another reference k, in which the co-located block, also at location (x,y) uses a motion vector that points at reference j. In such case, MVi,j (x,y)=MVi,k (x,y)+MVk,j (x,y). Motion vector chaining could be considered using multiple vectors, if these are available through other lists.
-
{right arrow over (mv i,j(x,y))}=Σm=0 m<n(αj×({right arrow over (mv i,km (x,y))}+{right arrow over (mv km ,j(x,y))})) (Eq. 3) - Chaining could also include multiple references if the path to the target reference is not immediate, e.g. MVi,j (x,y)=MVi,k (x,y)+MVm,j (x,y)+MVm,j (x,y). In effect, a motion vector from a frame at time t may reference data in a frame at time t−2 by building an aggregate motion vector from a first motion vector that references an intermediate frame at time t−1 (e.g., a motion vector from time t to t−1) and a second motion vector the references the destination frame at time t−2 from the intermediate frame (e.g., a motion vector from time t−1 to t−2).
- g. Motion acceleration may also be considered to provide a basis to predict current motion vectors. For example, the predictor of a motion vector may be computed for frame i of a block at location (x,y) from frame i−1, and assuming that all previous partitions used references frames that were only of a distance of 1, as follows:
-
MV (i,i-1)(x,y)=2*MV (i-1,i-2)(x,y)−MV (i-2,i-3)(x,y) (Eq. 4) - Of course, if the frames have different distance relationships, the acceleration computation would account for such relationships.
- h. In another aspect, motion vector derivation may be based on partitions in other frames and their actual motion path. Such process involves higher complexity since now the projected area may not correspond to a block used for coding. Such strategy is used, for example, in the AVS video coding standards. Furthermore, the projected area may contain multiple subpartitions each having its own motion. In such cases, encoders and decoders either may combine all the vectors together or may use all the vectors as independent hypotheses and subpartition the block being coded based on these hypotheses. Combining the vectors could be done using a variety of methods, e.g. using simple averaging, using weighted averaging based on the number of pixels that have a certain motion, by making use of median filtering, selection of the mode candidate, etc. Moreover, chaining could persist across multiple pictures, with the number of pictures to be chained constrained by the codec at a higher level if desired. In such a case also, if there is no usable vector, such as the block being intra encoded, chaining could terminate.
- i. Non-linear combinations of prediction samples may be performed instead of or in addition to using straight averaging or weighted averaging. For example different predictors hi(x,y) may be combined non-linearly using an equation of the form:
-
- Here, ref(x,y) may represent an anchor predictor that is most correlated with the current signal and f is the non-linear function. The function could be for example a gaussian function or some other similar function. For example, the anchor predictor can be chosen from the most adjacent frame temporally, or the predictor associated with the smallest QP. The anchor predictor could also be indicated at a higher syntax structure, e.g. the slice header.
- j. In another aspect, the different predictors for multihypothesis prediction can be weighted based also on location of the current sample and its distance to the blocks used for predicting the hypotheses motion vectors. In particular, for a block that is predicted using two hypotheses, where the first hypothesis used the motion information from the left block as its predictor and the second hypothesis used the top block as its predictor, the first hypothesis may be weighted to have higher impact on samples that are closer to the left edge of the block and the second hypothesis may be weighted to have higher impact on samples that are closer to the top edge of the block. In another example, if a block has predictors from above, from the left, from the above left, and from above right, as well as a temporal predictor, weights may be applied to each of these predictors. The temporal hypothesis may be assigned a single weight everywhere, whereas the hypothesis from the left may have relatively large weights on the left boundaries of the block with weight reductions at locations towards the right of the reconstructed block. Analogous weight distributions may be applied for the hypothesis at other locations (for example, the above block), where weights may be modulated based on directions of prediction.
- k. Lists with i>1 could be restricted to use only one reference index. In this case, no signalling of the reference index for these lists would be necessary.
- l. Motion vectors for a particular list may be limited to values within a defined window around the (0,0) motion vector, that is known to the decoder. Such limitations may contribute to better data prefetch operations at a decoder for motion compensation using more than 2 hypotheses.
- m. Subpixel motion compensation may be limited for multihypothesis prediction. For example, multihypothesis prediction may be limited to only ⅛th, quarter, half pel, or integer precision. Integer prediction could also be further restricted to certain integer multiples, e.g. multiples of 2, 3, 4, etc, or in general powers of 2. Such restrictions could be imposed on all lists if more hypotheses than 2 are used, or only on certain lists, e.g. list i>1, or list combinations, e.g. whenever
list 2 is used all lists are restricted to half pel precision. If such precision is used, and if these methods are combined with implicit motion vector derivation from another list, appropriate quantization and clipping of the derived motion parameters may be performed. - n. Limitations could also be imposed on the filters used for subpixel prediction, i.e. in the number of filter taps that will be used to generate those samples. For example, for uni- or bi-prediction, a coding syntax may permit 8 tap filters to be used, but for multihypothesis prediction with more than 2 references only 4 tap or bilinear filtering may be permitted for all lists. The filters could also be selected based on the list index. The limitations could be based on block size as well, i.e. for larger partitions longer filters may be used, but the size of the filters may be limited for smaller partitions.
- o. Limitations on the filters could also be imposed differently on luma vs chroma components.
- All of the above conditions could be implicitly designed in the codec, or could be signalled in higher syntax levels of the codec, such as sequence, picture parameter sets, or slice headers.
- Different ones of foregoing techniques may be applied in a coding syntax for prediction blocks of different size. For example, for larger partitions, the cost of signalling additional motion parameters is small. However, for smaller partitions that cost can be significant. Therefore, for larger partitions, e.g. 16×16 and above, explicit motion signalling of lists i>1 may be permitted, but, for partition sizes smaller than 16×16, derivation may be implicit.
- The principles of the foregoing aspects may be used cooperatively with other types of coding techniques. For example, merge and skip modes can also be supported in such codecs, and the derivation of the appropriate motion parameters discussed above may be extended to these modes as well.
- Motion Compensated Prediction with Hierarchical Multihypothesis
- The foregoing techniques find application in coding systems where video data is coded in recursively-partitioned pixel blocks. For example, HEVC and some other coding systems partition a frame of input data first into a Coding Tree Unit (CTU), also called as the Largest Coding Unit (commonly, an “LCU”), of a predetermined size (say, 64×64 pixels). Each CTU may be partitioned into increasingly smaller pixel blocks back on content contained therein. When the partitioning is completed, relationships of the CTUs and the coding units (“CUs”) contained within them may represent a hierarchical tree data structure. Moreover, prediction units (“PUs”) may be provided for the CUs at leaf nodes of the tree structure; the PUs may carry prediction information such as coding mode information, motion vectors, weighted prediction parameters, and the like.
- As discussed, aspects of the present disclosure extend multi-hypothesis prediction to provide for different coding hypotheses at different block sizes. For example, a block of size (2M)×(2M) can be designated as a bi-predicted block but, for each M×M subblock, different motion parameters can be signalled for each hypothesis/list. Furthermore, it is possible that one of the hypotheses/lists would be fixed within the entire block, whereas, for the second hypothesis, further subpartitioning may be performed. In such a scenario, an encoder can keep one of the hypotheses fixed for a larger area, while searching for the second hypothesis using smaller regions. Similarly, a decoder may use hypothesis information to determine whether it should “prefetch” the entire larger area in memory for generating the first hypothesis, whereas for the second hypothesis smaller areas could be brought into memory and then combined with the first hypothesis to generate the final prediction.
- Aspects of the present disclosure introduce a new prediction mode, called the “hierarchical multihypothesis” mode, which can be added in addition to the existing single list/unipredictive and bipredictive prediction modes available in existing codecs. For example, in HEVC, the parameter inter_pred_idc[x0][y0] may be expanded to indicate that additional subpartitioning is permitted for a block for any of the hypotheses and lists, i.e. List0, List1, or both. If such subpartitioning is permitted for only one of the hypotheses/lists, e.g. List0, then for the other list, e.g. List1, its associated parameters can be signalled immediately, e.g. within a prediction unit, whereas for the first hypothesis/list the signalling of said parameters would have to be done in combination with its partitioning.
- In a further aspect, subpartitions may include syntax elements to identify a null prediction which indicates that the additional hypothesis from the current list will not be considered for this subpartition. It may also be possible to signal syntax elements that will indicate the use of intra prediction instead of inter prediction in combination with or instead of other inter hypotheses for that subpartition.
- In HEVC, the hierarchical nature of the codec may induce overhead for signalling the prediction mode information, which ordinarily is provided in prediction units that are identified from leaf nodes of coding unit tress. Aspects of the present disclosure provide several techniques to reduce signalling overhead.
- In one aspect, the signalling of inter_pred_idc may be provided at higher levels of the coding_quadtree syntax than the prediction unit. The following syntax tables, demonstrate an example of how this could be done. In this particular example, a parameter named “quadtree_mode_enable_flag” may be provided at a sequence, picture, or slice level in a coding syntax that indicates whether the inter_pred_idc designations are provided at higher levels within the coding_quadtree syntax. If the quadtree_mode_enable_flag value is set to TRUE (1), for example, it may indicate that inter_pred_idc designations are so provided. The encoder may signal the inter prediction mode and the split flags in such higher level constructs, for example, in sequence, picture, or slice level itself, thus reducing the signalled overhead. In this particular example, intra prediction modes are not used within subpartitions, thus reducing overhead further.
- Table 5 provides an exemplary coding quadtree syntax provided in an HEVC-style coding application. The values split_cu_l0_flag[x0][y0], split_cu_l1_flag[x0][y0] and split_cu_flag[x0][y0] respectively indicate whether and how a CU may be further partitioned. The inter_pred_idc[x0][y0] may identify a prediction mode of all sub-partitions contained within the CU where it is identified.
-
TABLE 5 Exemplary Coding Quadtree Syntax Descriptor coding_quadtree( x0, y0, log2CbSize, cqtDepth ) { if( x0 + ( 1 << log2CbSize) <= pic_width_in_luma_samples && y0 + ( 1 << log2CbSize) <= pic_height_in_luma_samples && log2CbSize > MinCbLog2SizeY ) if (quadtree_mode_enable_flag = = 1 && slice_type = = B) enable_ctu_mode_flag ae(v) else enable_ctu_mode_flag = 0 if (enable_ctu_mode_flag == 1) { inter_pred_idc[ x0 ][ y0 ] ae(v) if ( inter_pred_idc[ x0 ][ y0 ] == PRED_HIER_BI) { split_cu_I0_flag[ x0 ][ y0 ] ae(v) split_cu_I1_flag[ x0 ][ y0 ] ae(v) IsHierSplitMode = 1 } else split_cu_flag[ x0 ][ y0 ] ae(v) } else { split_cu_flag[ x0 ][ y0 ] ae(v) } if( cu_qp_delta_enabled_flag && log2CbSize >= Log2MinCuQpDeltaSize ) { IsCuQpDeltaCoded = 0 CuQpDeltaVal = 0 } if( cu_chroma_qp_offset_enabled_flag && log2CbSize >= Log2MinCuChromaQpOffsetSize ) IsCuChromaQpOffsetCoded = 0 if (quadtree_mode_enable = = 1 && slice_type = = B) { if( IsHierSplitMode == 1) { quadtree_splitting( split_cu_I0_flag[ x0 ][ y0 ], PRED_L0, x0, y0, log2CbSize, cqtDepth ) quadtree_splitting( split_cu_I1_flag[ x0 ][ y0 ], PRED_L1, x0, y0, log2CbSize, cqtDepth ) } else { quadtree_splitting( split_cu_flag[ x0 ][ y0 ], inter_pred_idc[ x0 ][ y0 ], x0, y0, log2CbSize, cqtDepth ) } else { if( split_cu_flag[ x0 ][ y0 ] ) { x1 = x0 + ( 1 << ( log2CbSize − 1 ) ) y1 = y0 + ( 1 << ( log2CbSize − 1 ) ) coding_quadtree( x0, y0, log2CbSize − 1, cqtDepth + 1 ) if( x1 < pic_width_in_luma_samples ) coding_quadtree( x1, y0, log2CbSize − 1, cqtDepth + 1 ) if( y1 < pic_height_in_luma_samples ) coding_quadtree( x0, y1, log2CbSize − 1, cqtDepth + 1 ) if( x1 < pic_width_in_luma_samples && y1 < pic_height_in_luma_samples ) coding_quadtree( x1, y1, log2CbSize − 1, cqtDepth + 1 ) }else coding_unit( x0, y0, log2CbSize ) } } - Table 6 illustrates an exemplary quadtree splitting syntax, again, working from an HEVC-style coding syntax. This new syntax is utilized when the independent splitting mode for each list is enabled. Otherwise the original quadtree method, as is used in HEVC, is used. As indicated, coding_quadtree_list values may be derived from the inter_pred_idc value, which is signalled in the coding_quadtree syntax. The corresponding list is also passed into this syntax since this will be later used to determine which information, i.e. motion vectors from which corresponding list, will need to be signalled.
-
TABLE 6 Quadtree Splitting Syntax Descriptor quadtree_splitting( split_cu_flag , current_list, x0, y0, log2CbSize, cqtDepth ) { if( split_cu_flag) { x1 = x0 + ( 1 << ( log2CbSize − 1 ) ) y1 = y0 + ( 1 << ( log2CbSize − 1 ) ) coding_quadtree_list( current_list, x0, y0, log2CbSize − 1, cqtDepth + 1 ) if( x1 < pic_width_in_luma_samples ) coding_quadtree_list( current_list, x1, y0, log2CbSize − 1, cqtDepth + 1 ) if( y1 < pic_height_in_luma_samples ) coding_quadtree_list( current_list, x0, y1, log2CbSize − 1, cqtDepth + 1 ) if( x1 < pic_width_in_luma_samples && y1 < pic_height_in_luma_samples ) coding_quadtree_list( current_list, x1, y1, log2CbSize − 1, cqtDepth + 1 ) } else coding_unit_list( current_list, x0, y0, log2CbSize ) } - Table 7 illustrates coding_quadtree_list syntax, which shows a further use of the split_cu_flag to indicate whether the respective coding unit for the list current_list may be split further for multihypothesis coding. Here, the current_list value is passed from higher level syntactic elements.
-
TABLE 7 Coding Quadtree List Syntax Descriptor coding_quadtree_list( current_list, x0, y0, log2CbSize, cqtDepth ) { if( x0 + ( 1 << log2CbSize) <= pic_width_in_luma_samples && y0 + ( 1 << log2CbSize) <= pic_height_in_luma_samples && log2CbSize > MinCbLog2SizeY ) split_cu_flag[ x0 ][ y0] ae(v) if( cu_qp_delta_enabled_flag && log2CbSize >= Log2MinCuQpDeltaSize ) { IsCuQpDeltaCoded = 0 CuQpDeltaVal = 0 } if( cu_chroma_qp_offset_enabled_flag && log2CbSize >= Log2MinCuChromaQpOffsetSize ) IsCuChromaQpOffsetCoded = 0 quadtree_splitting( split_cu_flag[ x0 ][ y0], current_list, x0, y0, log2CbSize, cqtDepth ) } - Table 8 illustrates an exemplary coding unit list syntax, which makes use of the current_list value as passed from higher level syntactic elements. Here, derivation of prediction_unit_list parameters may use the current_list value from higher level syntactic elements.
-
TABLE 8 Coding Unit List Syntax Descriptor coding_unit_list( current_list, x0, y0, log2CbSize ) { if( transquant_bypass_enabled_flag ) cu_transquant_bypass_flag ae(v) cu_skip_flag[ x0 ][ y0] ae(v) nCbS = ( 1 << log2CbSize ) if( cu_skip_flag[ x0 ][ y0 ] ) prediction_unit_list( current_list, x0, y0, nCbS, nCbS ) else { part_mode ae(v) if( PartMode = = PART_2Nx2N ) prediction_unit_list ( current_list,x0, y0, nCbS, nCbS ) else if( PartMode = = PART_2NxN ) { prediction_unit_list ( current_list,x0, y0, nCbS, nCbS / 2 ) prediction_unit_list ( current_list,x0, y0 + ( nCbS / 2 ), nCbS, nCbS / 2 ) } else if( PartMode = = PART_Nx2N ) { prediction_unit_list ( current_list,x0, y0, nCbS / 2, nCbS ) prediction_unit_list ( current_list,x0 + ( nCbS / 2), y0, nCbS / 2, nCbS ) } else if( PartMode = = PART_2NxnU ) { prediction_unit_list ( current_list,x0, y0, nCbS, nCbS / 4 ) prediction_unit_list ( current_list,x0, y0 + ( nCbS / 4 ), nCbS, nCbS *3 / 4 ) } else if( PartMode = = PART_2NxnD ) { prediction_unit_list ( current_list,x0, y0, nCbS, nCbS * 3 / 4 ) prediction_unit_list ( current_list,x0, y0 + ( nCbS *3 / 4 ), nCbS, nCbS / 4 ) } else if( PartMode = = PART_nLx2N ) { prediction_unit_list ( current_list,x0, y0, nCbS / 4, nCbS ) prediction_unit_list ( current_list,x0 + ( nCbS / 4 ), y0, nCbS *3 / 4, nCbS ) } else if( PartMode = = PART_nRx2N ) { prediction_unit_list ( current_list,x0, y0, nCbS *3 / 4, nCbS ) prediction_unit_list ( current_list,x0 + ( nCbS *3 / 4 ), y0, nCbS / 4, nCbS ) } else {/* PART_NxN */ prediction_unit_list ( current_list,x0, y0, nCbS / 2, nCbS / 2 ) prediction_unit_list ( current_list,x0 + ( nCbS / 2 ), y0, nCbS / 2, nCbS / 2 ) prediction_unit_list ( current_list,x0, y0 + ( nCbS / 2 ), nCbS / 2, nCbS / 2 ) prediction_unit_list ( current_list,x0 + ( nCbS / 2 ), y0 + ( nCbS / 2 ), nCbS / 2, nCbS / 2 ) } } } - Table 9 illustrates an exemplary syntax for coding prediction units in an aspect of the present disclosure. In this aspect, values of ref_idx_l0, mvp_l0 flag, ref_idx_l1, mvp_l1 flag are signalled at the encoder and known to be decoded at the decoder for only the list current_list, which is passed to the prediction_unit_list process from higher level syntactic elements. This is different from legacy prediction units where the mode was indicated within the prediction unit, and which would indicate whether list0 only, list1 only, or biprediction shall be used and the related parameters will be signalled together within the same prediction unit.
-
TABLE 9 Prediction Unit List Syntax Descriptor prediction_unit_list( current_list, x0, y0, nPbW, nPbH ) { if( cu_skip_flag[ x0 ][ y0 ] ) { if( MaxNumMergeCand > 1 ) merge_idx[ x0 ][ y0 ] ae(v) } else { /* MODE_INTER */ merge_flag[ x0 ][ y0 ] ae(v) if( merge_flag[ x0 ][ y0 ] ) { if( MaxNumMergeCand > 1 ) merge_idx[ x0 ][ y0 ] ae(v) } else { if(current_list != PRED_L1 ) { if( num_ref idx_I0_active_minus1 > 0 ) ref_idx_I0[ x0 ][ y0] ae(v) mvd_coding( x0, y0, 0 ) mvp_I0_flag[ x0 ][ y0 ] ae(v) } if(current_list != PRED_L0 ) { if( num_ref_idx_I1_active_minus1 > 0) ref_idx_I1[ x0 ][ y0] ae(v) if( mvd_I1_zero_flag && inter_pred_idc = = PRED_BI ) { MvdL1[ x0 ][ y0 ][ 0 ] = 0 MvdL1[ x0 ][ y0 ][ 1 ] = 0 } else mvd_coding( x0, y0, 1 ) mvp_I1_flag[ x0 ][ y0 ] ae(v) } } } } - Although not shown above, the foregoing principles may be extended to signalling of local illumination parameters, also. Moreover, the principles may be extended to other coding blocks such as transform coding, overlapped block motion compensation, and deblocking. Transform coding for example in HEVC can be performed independently from the specifications of the prediction units.
- In another aspect, overhead signalling may be limited even further by constraining subpartitioning in hierarchical multihypothesis prediction to one and only one of the lists. When a multiple hypothesis coding mode is selected, then, for one of the lists, the partitioning may be fixed. In such a case, no special indication flag would be needed, and a single set of motion parameters would be sent for one list, whereas, for the other list(s), the syntax may accommodate further subpartitioning and signalling of additional parameters. Such aspects may be built into coding rules for the syntax, in which case the constraint would be enforced always. Alternatively, the constraint may be activated dynamically by an encoder, in which case it would also be signalled at higher-level constructs within a coding syntax (e.g., the sequence, picture, or slice headers). Moreover, the depth subpartitioning of each list may be signalled at a similar level. Such depth could also be associated with the depth of the coding unit, in which case, further subdivision of a coding unit would not occur beyond a certain block size. Such constraints may have an impact on when an encoder sends split flag parameters in the stream. If for example, an encoder reaches the maximum splitting depth, no additional split flags need to be transmitted.
- In another aspect, coding syntax may include elements that identify prediction lists that are to be used for pixel blocks. Some coding protocols organize prediction references for bipredictively-coded pixel blocks into lists, commonly represented as “List0” and “List1.” and identify reference frames for prediction using indexes into such lists. Aspects of the present disclosure, however, introduce signalling at higher levels of a syntax (e.g., at the sequence, picture, slice header, coding tree unit level (or coding unit groups of a certain size in a codec such as HEVC, or macroblocks in a codec like MPEG-4 AVC)), of whether a given list prediction is present in a lower coding unit within the higher level. The signalling may be provided as an “implicit prediction” flag that, when activated for a given list, indicates that the list is to be used for coding of the pixel blocks referenced by the sequence, slice, etc.
- Table 10 provides an example of syntax that may indicate whether prediction lists are present for bipredictively-coded pixel blocks. In this example, the syntax is provided as part of signalling of a slice.
-
TABLE 10 Exemplary Slice Segment Syntax Descriptor slice_segment_header( ) { . . . if (slice_type == B) { list0_prediction_implicit_present_flag u(1) list1_prediction_implicit_present_flag u(1) } - In this example, when either of list0_prediction_implicit present_flag or list1_prediction_implicit_present_flag are set, it indicates that the respective list (e.g., List0, List 1) data is always present for coding of the pixel blocks contained within the slice. Again, the syntax may be provided for other syntactic elements in coded video data, such as a sequence, a picture, a coding tree unit level or other coding unit groups. Table 12 below also illustrates application of the implicit prediction flag syntax to an exemplary HEVC-style slice syntactic element.
- In this manner, for those prediction lists for which the implicit prediction flag is set to TRUE, it is no longer required to provide prediction indications in the data of the coded pixel blocks (for example, a prediction unit) themselves. These flags, when indicated, imply that always that prediction list would be used. For example, if list0_prediction_implicit_present_flag is set, then an inter-predicted block in that slice (excluding Intra, SKIP, and MERGE predicted blocks), would be either a unipredicted block using List0 or a bipredicted block using List0 and another prediction reference. An inter-predicted block could not be a unipredicted block using List1 only. Similarly, if list1_prediction_implicit_present_flag is set, then a block in that slice would be either a unipredicted block using List1 or a bipredicted block using List1 and another prediction reference. If both flags are set TRUE, then the corresponding blocks can only use biprediction using both List0 and
List 1. - The principles of the implicit prediction flags may be extended to MHMC coding using a higher number N of hypotheses (N>2). In such a case, implicit prediction flags may be provided for the list corresponding to each hypothesis. A list_prediction_implicit_present_flag[i] is shown, for example, in Table 1. When a list_prediction_implicit_present_flag[i] is set for a hypothesis i, it signifies that the associated prediction list (e.g., a List) will be used for coding of pixel blocks contained within the respective syntactic elements (e.g., the sequence, picture, slice, etc.).
- It is expected that, by placing the implicit prediction flags in syntactic elements of coded video data that are at higher levels than the coding data of individual pixel blocks (for example, individual prediction units), the bitrate cost of signalling prediction types will be lowered.
- Table 11, for example, indicates how syntax of an HEVC-style prediction unit might be simplified using the implicit prediction flags:
-
TABLE 11 Exemplary HEVC-style Prediction Unit Syntax Modified To Accommodate Implicit Prediction Flags Descriptor prediction_unit( x0, y0, nPbW, nPbH ) { if( cu_skip_flag[ x0 ][ y0 ] ) { if( MaxNumMergeCand > 1 ) merge_idx[ x0 ][ y0 ] ae(v) } else { /* MODE_INTER */ merge_flag[ x0 ][ y0 ] ae(v) if( merge_flag[ x0 ][ y0 ] ) { if( MaxNumMergeCand > 1 ) merge_idx[ x0 ][ y0 ] ae(v) }else { if( slice_type = = B ) { if( list0_prediction_implicit_present_flag == 0) list0_pred_idc_flag[ x0 ][ y0 ] ae(v) else list0_pred_idc_flag[ x0 ][ y0 ] = 1 if( list1_prediction_implicit_present_flag == 0 && list0_pred_idc_flag == 1) /* if list0 is not to be used, it implicitly signals that list1 has to be used and thus avoid signalling */ list1_pred_idc_flag[ x0 ][ y0 ] ae(v) else list1_pred_idc_flag[ x0 ][ y0 ] = 1 } else { /* slice_type == P */ list0_pred_idc_flag[ x0 ][ y0 ] = 1 list1_pred_idc_flag[ x0 ][ y0 ] = 0 } if( list0_pred_idc_flag[ x0 ][ y0 ] == 1 ) { if( num_ref_idx_I0_active_minus1 > 0 ) ref_idx_I0[ x0 ][ y0 ] ae(v) mvd_coding( x0, y0, 0) mvp_I0_flag[ x0 ][ y0 ] ae(v) } if( list1_pred_idc_flag[ x0 ][ y0 ] == 1 ) { if( num_ref_idx_I1_active_minus1 > 0 ) ref_idx_I1[ x0 ][ y0 ] ae(v) if( mvd_I1_zero_flag && list0_pred_idc_flag[ x0 ][ y0 ] = = 1 ) { MvdL1[ x0 ][ y0 ][ 0 ] = 0 MvdL1[ x0 ][ y0 ][ 1 ] = 0 } else mvd_coding( x0, y0, 1 ) mvp_I1_flag[ x0 ][ y0 ] ae(v) } } } }
As shown above, when the list0_prediction_implicit_present_flag is set to TRUE, then prediction unit syntax will not include the flag list0_pred_idc_flag[x0][y0]. Ordinarily, the list0_pred_idc_flag[x0][y0] specifies whether list0 used for the current prediction unit. In this case this is already known from a higher layer. Similarly, when the list1_prediction_implicit_present_flag is set to TRUE, then prediction unit syntax will not include the flag list1_pred_idc_flag[x0][y0]. As noted in Table 11, when list0 is not to be used (list0_pred_idc_flag1), it signals implicitly that list1 has to be used and thus signalling of list1_pred_idc_flag[x0][y0] can be avoided in such a case. - Table 12 illustrates the syntax of Table 1 applied to an exemplary slice segment header, which is derived from the HEVC protocol:
-
TABLE 12 Slice Segment Header To Support Multihypothesis Prediction Descriptor slice_segment_header( ) { first_slice_segment_in_pic_flag u(1) if( nal_unit_type >= BLA_W_LP && nal_unit_type <= RSV_IRAP_VCL23 ) no_output_of_prior_pics_flag u(1) slice_pic_parameter_set_id ue(v) if( !first_slice_segment_in_pic_flag ) { if( dependent_slice_segments_enabled_flag ) dependent_slice_segment_flag u(1) slice_segment_address u(v) } if( !dependent_slice_segment_flag ) { for( i = 0; i < num_extra_slice_header_bits; i++ ) slice_reserved_flag[ i ] u(1) slice_type ue(v) if( slice_type == MH ) { number_of_lists_minus2 ue(v) for( i = 0; i < number_of_lists_minus2 + 2; i++ ) { list_prediction_implicit_present_flag[ i ] u(1) log2_min_luma_hypothesis_block_size_minus2 [ i ] ue(v) log2_diff_max_min_luma_hypothesis_block_size [ i ] ue(v) } } else { /* slice_type == P */ number_of_lists_minus2 = −1 list_prediction_implicit_present_flag[ 0 ] = 1 } if( output_flag_present_flag ) pic_output_flag u(1) if( separate_colour_plane_flag = = 1 ) colour_plane_id u(2) if( nal_unit_type != IDR_W_RADL && nal_unit_type != IDR_N_LP ) { slice_pic_order_cnt_lsb u(v) short_term_ref_pic_set_sps_flag u(1) if( !short_term_ref_pic_set_sps_flag ) st_ref_pic_set( num_short_term_ref_pic_sets ) else if( num_short_term_ref_pic_sets > 1 ) short_term_ref_pic_set_idx u(v) if( long_term_ref_pics_present_flag ) { if( num_long_term_ref_pics_sps > 0 ) num_long_term_sps ue(v) num_long_term_pics ue(v) for( i = 0; i < num_long_term_sps + num_long_term_pics; i++ ) { if( i < num_long_term_sps ) { if( num_long_term_ref_pics_sps > 1 ) lt_idx_sps[ i ] u(v) } else { poc_lsb_lt[ i ] u(v) used_by_curr_pic_lt_flag[ i ] u(1) } delta_poc_msb_present_flag[ i ] u(1) if( delta_poc_msb_present_flag[ i ] ) delta_poc_msb_cycle_lt[ i ] ue(v) } } if( sps_temporal_mvp_enabled_flag ) slice_temporal_mvp_enabled_flag u(1) } if( sample_adaptive_offset_enabled_flag ) { slice_sao_luma_flag u(1) if( ChromaArrayType != 0) slice_sao_chroma_flag u(1) } if( slice_type = = P | | slice_type = = MH ) { num_ref_idx_active_override_flag u(1) if( num_ref_idx_active_override_flag ) { for( i = 0; i < number_of lists_minus2 + 2; i++ ) num_ref_idx_ active_minus1_list[ i ] ue(v) } if( lists_modification_present_flag && NumPicTotalCurr > 1 ) ref pic_lists_modification( ) mvd_list_zero_flag[ 0 ] = 0 for( i = 1; i < number_of lists_minus2 + 2; i++ ) mvd_list_zero_flag[ i ] u(1) if( cabac_init_present_flag ) cabac_init_flag u(1) if( slice_temporal_mvp_enabled_flag ) { if( slice_type == MH ) collocated_from_list ue(v) else collocated_from_list = 0 ue(v) if( (num_ref_idx_active_minus1_list[ collocated_from_list ] > 0 ) collocated_ref_idx ue(v) } if( ( weighted_pred_flag && slice_type = = P ) | | ( weighted_mh_pred_flag && slice_type = = MH ) ) pred_weight_table( ) five_minus_max_num_merge_cand ue(v) if( motion_vector_resolution_control_idc = = 2 ) use_integer_mv_flag u(1) } slice_qp_delta se(v) if( pps_slice_chroma_qp_offsets_present_flag ) { slice_cb_qp_offset se(v) slice_cr_qp_offset se(v) } if( pps_slice_act_qp_offsets_present_flag ) { slice_act_y_qp_offset se(v) slice_act_cb_qp_offset se(v) slice_act_cr_qp_offset se(v) } if( chroma_qp_offset_list_enabled_flag ) cu_chroma_qp_offset_enabled_flag u(1) if( deblocking_filter_override_enabled_flag ) deblocking_filter_override_flag u(1) if( deblocking_filter_override_flag ) { slice_deblocking_filter_disabled_flag u(1) if( !slice_deblocking_filter_disabled_flag ) { slice_beta_offset_div2 se(v) slice_tc_offset_div2 se(v) } } if( pps_loop_filter_across_slices_enabled_flag && ( slice_sao_luma_flag | | slice_sao_chroma_flag | | !slice_deblocking_filter_disabled_flag ) ) slice_loop_filter_across_slices_enabled_flag u(1) } if( tiles_enabled_flag | | entropy_coding_sync_enabled_flag ) { num_entry_point_offsets ue(v) if( num_entry_point_offsets > 0 ) { offset_len_minus1 ue(v) for( i = 0; i < num_entry_point_offsets; i++ ) entry_point_offset_minus1[ i ] u(v) } } if( slice_segment_header_extension_present_flag ) { slice_segment_header_extension_length ue(v) for( i = 0; i < slice_segment_header_extension_length; i++ ) slice_segment_header_extension_data_byte[ i ] u(8) } byte_alignment( ) } - In other aspects, syntax may be developed to identify non-square prediction block sizes, for example, with parameters to define width and height of the minimum and maximum prediction block sizes.
- The slice header syntax of Table 12 may lead to a syntax for a prediction unit that can handle multihypothesis slices, as shown in Table 13.
-
TABLE 13 Prediction Unit Syntax Of HEVC Modified Using The Proposed Method Descriptor prediction_unit( x0, y0, nPbW, nPbH ) { if( cu_skip_flag[ x0 ][ y0 ] ) { if( MaxNumMergeCand > 1 ) merge_idx[ x0 ][ y0 ] ae(v) } else {/* MODE_INTER */ merge_flag[ x0 ][ y0 ] ae(v) if( merge_flag[ x0 ][ y0 ] ) { if( MaxNumMergeCand > 1 ) merge_idx[ x0 ][ y0 ] ae(v) } else { set_lists = 0 for( i = 0; i < number_of_lists_minus2 + 2; i++ ) { if( !list_prediction_implicit_present_flag [ i ] && ( ( i < number_of_lists_minus2 + 1) | | set_lists > 0) list_pred_idc_flag[ i ][ x0 ][ y0 ] ae(v) else list_pred_idc_flag[ i ][ x0 ][ y0 ] = 1 set_lists += list_pred_idc_flag[ i ][ x0 ][ y0 ] } for( i = 0; i < number_of lists_minus2 + 2; i++ ) { if( list_pred_idc_flag[ i ][ x0 ][ y0 ] ) { if( num_ref_idx_active_minus1_list[ i ] > 0 ) ref_idx_list[ i ][ x0 ][ y0 ] ae(v) if( mvd_list_zero_flag[ i ] && list_pred_idc_flag[ 0 ][ x0 ][ y0 ] ) { MvdList[ i ][ x0 ][ y0 ][ 0 ] = 0 MvdList[ i ] [ x0 ][ y0 ][ 1 ] = 0 } else mvd_coding( x0, y0, i ) mvp_list_flag[ i ][ x0 ][ y0 ] ae(v) } } } } }
Here, again, when the list_prediction_implicit_present_flag [i] of a list i is set to TRUE, then prediction unit syntax will not include the flag list_pred_idc_flag[i][x0][y0]. - Other aspects can achieve further conservation of signalling overhead. For example, rather than using the control:
-
- if(mvd_list_zero_flag[i] && list_pred_idc_flag[0][x0][y0]) {
a coding syntax may employ an alternate control, as follows: - if(mvd_list_zero_flag[i]) {
In this aspect, all motion vector differences in list i would be forced to zero, which conserves bitrate that otherwise would be allocated to coding of such elements. However, of course, the motion vector field for this list is considerably constrained.
- if(mvd_list_zero_flag[i] && list_pred_idc_flag[0][x0][y0]) {
- In another aspect, coding syntax rules may be to set MvdList to zero only when a block is predicted by more than one lists, regardless of which lists are used. For example, the following condition may be applied:
-
- if(mvd_list_zero_flag[i] && set_lists>1) {
This technique increases the flexibility of signalling list prediction for bipredicted and multihypothesis predicted partitions between an encoder and a decoder.
- if(mvd_list_zero_flag[i] && set_lists>1) {
- In a further aspect, signalling may be provided at a lower level than a slice unit, that identifies which lists are to be used by also restricting the reference indices of any list[i] to one. Doing so may save bits for uni-prediction or multihypothesis prediction. Such signalling could be applied, for example at fixed CTU intervals. For example, signalling may be provided for N CTUs with N potentially specified at a higher level (such as the sequence parameter sets, picture parameter sets, or a slice, at a CTU level or even lower), but could also be signalled in a new coding unit, e.g. a CTU Grouping unit, that can have an arbitrary number of CTUs, e.g. M, associated with it. Such number could be signalled within the CTU Grouping unit, but could also be omitted and the CTU Grouping unit could contain appropriate termination syntax (Start and end codes) that allows one to determine how many CTUs are contained within.
- In such an aspect, coding syntax may provide additional information within a CTU grouping, such as follows:
- a. Indications that a particular list index is enabled or disabled for all CTUs in the CTU grouping unit. An encoder also may enable or disable list indices for uni-prediction or multihypothesis prediction if desired.
- b. For each enabled list index, indications of whether all reference indices or only one will be used. An encoder may provide such indications independently or jointly for uni-prediction or multihypothesis prediction (i.e. for uni-prediction all references of a particular list could be used and only one for multi-hypothesis prediction if both modes are enabled).
- c. If only one reference index is to be used, explicitly selections of which reference index that would be. In this case, if an encoder provides such a selection in a CTU grouping unit, the encoder need not signal the reference index for the blocks contained within the CTU grouping unit.
- Other aspects of the present disclosure provide techniques for motion vector prediction both for bipredictive multihypothesis motion compensation (e.g., up to 2 hypotheses) and for multi-hypotheses prediction (N>2).
- According to such techniques, a coder may select to use, as predictors, one or more motion vectors of the current block from a list or lists that have already been signalled in the bit stream. For example, assume that, by the time list1 motion vectors are to be coded, list0 motion vector parameters have already been signalled for the current block. In this example, the list1 motion vectors for the same block may be predicted directly from information of the list0 coding, instead of the surrounding partitions. That is, an encoder may compute a predictor vector as
-
{right arrow over (mvpListL1)}=a*{right arrow over (mvListL0)}+{right arrow over (β)} where (Eq. 6) - a is a scaling factor that relates to the distances of the references in L0 and L1 compared to the current picture, and {right arrow over (β)} is a constant that could be sent at a higher level syntax structure (e.g. the sequence parameter sets, picture parameter sets, or the slice header). The value of a for example, can be computed using the picture order count (poc) information associated with each picture,
-
- The use of this predictor also may be signalled implicitly, i.e. enabled always for a partition. Further, an encoder may signal which predictor should be used from a possible list of motion vector predictor candidates using a parameter such as a flag. If, for example, this parameter or flag is set to 0, then the collocated vector in the same block from the other list would be used, otherwise another predictor, e.g. a predictor based on a similar derivation as in HEVC, or a median, or averaging candidate of the spatial or temporal neighbourhood motion vectors could be used.
- These principles may be extended to hypotheses coming from lists with index higher than 1. In one aspect, prediction could come from list0, if the prediction is always available, or from the lowest index available list used for prediction. In another aspect, signalling may be provided to identify the list that is to be used for predicting the parameters of the current list. Such signalling could be done globally (e.g. at a higher syntax element such as the sequence or picture parameter sets, or the slice header), where an encoder may signal how vectors of a particular list, when used in multihypothesis prediction, would be predicted, or done at the block level using a parameter similar to conventional mvp_lX_flag parameters.
- In another aspect, an encoder may combine the parameters mvp_lX_flags into a single parameter that signals predictors jointly for all lists. This aspect may reduce signalling overhead as compared to cases in which signalling is provided independently for each hypothesis. That is, instead of signalling that L0 and L1 would use the first predictor using the value of 0 for mvp_l0_flag and mvp_l1_flag independently, in this new parameter mvp_selector, if its value is 0 both lists are selected from the same first predictor. If the value is 1, then both are selected from the second predictor, if that is available, whereas if its value is equal to 2, then for L0 it indicates that an encoder used its first predictor and, for L1, its second predictor and so on. Such correspondence between the value of mvp_selector and which predictor to use for each list could be pre-specified as a syntax coding rule, or it could also be signalled inside the bit stream at a higher syntax element (e.g. in the sequence or picture parameter sets, or the slice header). The mvp_selector could be used with more than 2 lists.
- In another aspect, an encoder may entropy code the element mvp_selector using an entropy coding scheme such as UVLC (exp-golomb coding) or Context Adaptive Binary Arithmetic Coding (CABAC). In the case of CABAC, occurrence statistics may be used to update probability statistics and thus how the value is encoded using an adaptive arithmetic encoding engine. With respect to use of CABAC, the statistics collected would be impacted by how many lists are enabled each time. If a single list is to be used, there are only two possible state values for example, whereas if 2 lists are used, then more states are used. To maintain efficiency, an encoder may maintain a table of mvp_selector parameters depending on the number and/or characteristics of combined lists. Adaptation for entropy coding of each entry in the table may be done independently. In another aspect, a single entry may be maintained, but the statistics may be normalized as coding is performed based on the number of lists used for the current block. If, for example, only uniprediction is to be used, values for mvp_selector above 1 would not make sense. In that case, the normalization would be based only on the number of occurrences of
values 0 and 1. - Prediction of motion vectors could also be performed jointly using both neighbouring (spatial and/or temporal) and collocated (motion vectors for the current block but from another list) candidates. In particular, an encoder may use the information of both the mvd (motion vector error) and mvp (motion vector predictor) from the collocated candidate, to derive an mvp for the current list, (e.g., list[i], i=1) as follows:
-
- In this derivation, prediction uses both the neighbouring predictor, the predictor of the list0 vector, and its motion vector residual error. In another aspect, the scaling factor for mvd
-
- could be replaced using a single value that is related to the temporal distances between references. This alternative approach may reduce complexity at the expense of lowered accuracy.
- In another aspect, syntax rules may impose the use of a single predictor per list and identify, for the current block, which earlier-encoded list could be used to predict a subsequent list. In such case, the syntax still may permit the use of more than one lists to predict the current list. This could be done, for example, using an averaging, either linear or nonlinear. For example predictors may be derived as:
-
- In the above, wi are weights that may be either pre-specified by syntax rules or signalled at a higher coding level. The weights wi need not sum to X (e.g., Σi=0 i<X wi≠1).
- In another aspect, signalling of which mvds to use from earlier lists may be done using an mvp_selector parameter signalled in the prediction unit.
- In a further aspect, an encoder may use a single predictor for multiple hypotheses or tie the computation of the predictors for multiple hypotheses together. Such coding decisions may be signalled in the coding bit stream or may be defined by coding rules of its syntax.
- In one example, mvp_l0[ref_idx_l0] and mvp_l1[ref_idx_l1] can be tied together. This could be done by signalling which lists and which references will be accounted at the same time at a higher level syntax structure (e.g. the sequence level, the picture level, or the slice level). Such consideration could also be made at a group of CTUs level as well. The relationship between the two mvps could be derived by using a simple scaling factor (scale_factor) that could be signalled explicitly or, alternatively, could be determined by POC differences of the two references. In such as case, instead of forming mvp_l0[ref_idx_l0] from one candidate set and forming mvp_l1[ref_idx_l1] from another candidate set, a decoder can form both at the same time from a combined candidate set. This can be achieved by a variety of ways, for instance if mvp_l0[ref_idx_l0] is computed as median vector of three candidates, the decoder can re-formulate that as minimizing a sum over three terms.
-
argminx{Σi |x−mv candidates[i]|} (Eq. 9) - Now with combined candidates, the decoder minimizes a similar sum with six terms.
- In the above mvp_l1[ref_idx_l1] can then be computed as scale_factor*mvp_l0[ref_idx_l0]. Other motion vectors, such as mvp_l0[0:L], can be determined together in a similar fashion. This concept can also be extended to multiple lists (>2).
- In another aspect, motion vector prediction of a given pixel block may be predicted from other pixel blocks that are neighbours of the pixel block being coded. A bi-predictive or multihypothesis candidate may be based on pixel blocks in a spatial neighbourhood of the block being coded, which may be represented an index j into a merge_idx list. This candidate j is associated with motion vectors and references for all its lists, e.g. L0 and L1. A coder may introduce an additional candidate, (called j+1), which uses the same predictor for L0, but for L1 only the reference index is reused. For the L1 motion vector candidate, an encoder may scale its motion vectors based on relative temporal displacement between the L1 and the L0 candidates. In another case, an encoder may select a first, unipredicted candidate (e.g. L0), for skip or merge, which has a consequence that no neighbours would provide candidates for another list. In such a case, an encoder may indicate, using merge_idx, a derived candidate for another list (say, L1) given its L0 candidate, which enables biprediction or multihypothesis prediction using another merge_idx.
- The foregoing techniques may be extended to coding of Overlapped Block Motion Compensation with multihypothesis prediction as well as affine block motion compensation, for coding systems that support those features.
- The foregoing discussion has described operation of the aspects of the present disclosure in the context of video coders and decoders. Commonly, these components are provided as electronic devices. Video decoders and/or controllers can be embodied in integrated circuits, such as application specific integrated circuits, field programmable gate arrays, and/or digital signal processors. Alternatively, they can be embodied in computer programs that execute on camera devices, personal computers, notebook computers, tablet computers, smartphones, or computer servers. Such computer programs typically are stored in physical storage media such as electronic-, magnetic-, and/or optically-based storage devices, where they are read to a processor and executed. Decoders commonly are packaged in consumer electronics devices, such as smartphones, tablet computers, gaming systems, DVD players, portable media players and the like; and they also can be packaged in consumer software applications such as video games, media players, media editors, and the like. And, of course, these components may be provided as hybrid systems that distribute functionality across dedicated hardware components and programmed general-purpose processors, as desired.
- Video coders and decoders may exchange video through channels in a variety of ways. They may communicate with each other via communication and/or computer networks as illustrated in
FIG. 1 . In still other applications, video coders may output video data to storage devices, such as electrical, magnetic and/or optical storage media, which may be provided to decoders sometime later. In such applications, the decoders may retrieve the coded video data from the storage devices and decode it. - Several embodiments of the invention are specifically illustrated and/or described herein. However, it will be appreciated that modifications and variations of the invention are covered by the above teachings and within the purview of the appended claims without departing from the spirit and intended scope of the invention.
Claims (33)
1. A video coding method, comprising:
developing a plurality of coding hypotheses for an input pixel block of frame content, each coding hypothesis including generating prediction data for the input pixel block according to a respective prediction search,
coding the input pixel block with reference to a prediction block formed from prediction data of the plurality of hypotheses, and
transmitting data of the coded pixel block and data identifying a number of the hypotheses used during the coding to a channel.
2. The method of claim 1 , wherein the number of hypotheses is greater than 2.
3. The method of claim 1 , further comprising transmitting data representing a size of partitions used to generate the prediction data of a hypothesis for the input pixel block.
4. The method of claim 1 , further comprising transmitting data representing ranges of allowable partition sizes used to generate the prediction data of a hypothesis for the input pixel block.
5. The method of claim 1 , further comprising, for a coding hypothesis, transmitting index data identifying an element from a list of prediction references identifying a reference frame that was selected according to the prediction search of the hypothesis.
6. The method of claim 1 , further comprising, providing an idc_pred_idc value that identifies a prediction mode for the pixel block.
7. The method of claim 6 , wherein the idc_pred_idc value is provided in a syntax element corresponding to the pixel block.
8. The method of claim 6 , wherein the idc_pred_idc value is provided in a syntax element higher than the pixel block in a coded data stream, and it applies to multiple coded pixel blocks in the coded data stream.
9. The method of claim 1 , further comprising, providing an identification of a prediction list of a hypothesis in a coded data stream, the identified provided in a syntax element that is higher than a syntax element of the coded pixel block data.
10. The method of claim 1 , further comprising, providing a motion vector for one of the hypotheses that is a prediction from a motion vector of another one of the hypotheses.
11. Coded video data, stored in a computer readable medium, representing frame content, the coded video data comprising:
coded video data of a plurality of pixel blocks, each represented by a plurality of coding hypotheses, each coding hypothesis representing prediction data for the pixel block according to a respective prediction search,
a parameter identifying an element from a list of prediction references identifying a reference frame, the parameter applying to hypotheses of multiple coded pixel blocks.
12. An encoder that generates the coded video data of claim 11 from input video.
13. A decoder that generates reconstructed video from the coded video data of claim 11 .
14. The coded video data of claim 11 , further comprising data representing a size of partitions used to generate the prediction data of a hypothesis for the input pixel block.
15. The coded video data of claim 11 , further comprising data representing ranges of allowable partition sizes used to generate the prediction data of a hypothesis for the input pixel block.
16. The coded video data of claim 11 , further comprising, for a coding hypothesis, index data identifying an element from a list of prediction references identifying a reference frame that was selected according to the prediction search of the hypothesis.
17. The coded video data of claim 11 , further comprising an idc_pred_idc value that identifies a prediction mode for the pixel block.
18. The coded video data of claim 17 , wherein the idc_pred_idc value is placed in a syntax element of the pixel block.
19. The coded video data of claim 17 , wherein the idc_pred_idc value is placed in a syntax element higher than the pixel block, and it applies to multiple coded pixel blocks in the coded data stream.
20. The coded video data of claim 11 , further comprising data identifying a prediction list of a hypothesis in a coded data stream, the identified provided in a syntax element that is higher than a syntax element of the coded pixel block data.
21. The coded video data of claim 11 , further comprising data representing a motion vector for one of the hypotheses that is a prediction from a motion vectors of another one of the hypotheses.
22. A video decoding method, comprising:
responsive to data provided from a channel identifying a number of the coding hypotheses applied to a pixel block to be decoded, developing a prediction block for the pixel block from coding data representing the hypotheses, each coding hypothesis identifying a respective prediction source, and
decoding the pixel block with reference to the prediction block.
23. The method of claim 22 , wherein the number of hypotheses is greater than 2.
24. The method of claim 22 , further comprising responsive to channel data identifying a size of partitions used to generate the prediction data of a respective hypothesis, developing the prediction data for the respective hypothesis by extracting data from a reference frame at a size corresponding to the partition size.
25. The method of claim 22 , further comprising, responsive to channel data identifying an idc_pred_idc value that identifies a prediction mode for a pixel block, decoding coded subpartitions of the pixel block according to the identified prediction mode.
26. The method of claim 25 , wherein the idc_pred_idc value is provided in a syntax element corresponding to the pixel block.
27. The method of claim 25 , wherein the idc_pred_idc value is provided in a syntax element higher than the pixel block in a coded data stream, and it applies to multiple coded pixel blocks in the coded data stream.
28. The method of claim 25 , further comprising, responsive to an identification of a prediction list of a hypothesis in a coded data stream, developing prediction data for a corresponding hypothesis according to a reference frame identified by the identification.
29. The method of claim 28 , wherein the identification is occurs in a syntactic element of the coded data stream that is at a higher level than a syntactic element of the coded pixel block.
30. The method of claim 25 , further comprising, predicting a motion vector for one of the hypotheses from a motion vector of another one of the hypotheses.
31. A video coding method, comprising:
developing a first coding hypothesis for an input pixel block of frame content, the first coding hypothesis including prediction data selected using a first partition size,
developing a second coding hypothesis for the input pixel block, the second coding hypothesis including prediction data selected using a second partition size different from the first partition size,
coding the input pixel block with reference to a prediction block formed from the prediction data of the first coding hypothesis and the prediction data of the second coding hypothesis.
32. A video decoding method, comprising:
developing first prediction data from coded video data representing a first coding hypothesis of a pixel block to be decoded, the coded video data identifying a first partition size associated with the first prediction data,
developing second prediction data from coded video data representing a second coding hypothesis of a pixel block to be decoded, the coded video data identifying a second partition size associated with the second prediction data, the second partition size different from the first partition size
decoding coded video data of the pixel block to be decoded with reference to a prediction block formed from the first prediction data and the second prediction data.
33. Coded video data, stored in a computer readable medium, representing a pixel block of frame content having been predictively coded, the coded video data comprising:
data representing a first coding hypothesis for the pixel block of frame content, the first coding hypothesis identifying a first partition size of the first coding hypothesis,
data representing a second coding hypothesis for the pixel block of frame content, the second coding hypothesis identifying a second partition size for the second coding hypothesis, the second partition size different from the first partition size;
data representing content of the pixel block having been coded with reference to a prediction block formed from the prediction data of the first coding hypothesis and the prediction data of the second coding hypothesis.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/257,904 US20190246114A1 (en) | 2018-02-02 | 2019-01-25 | Techniques of multi-hypothesis motion compensation |
US16/879,007 US11463707B2 (en) | 2018-02-02 | 2020-05-20 | Techniques of multi-hypothesis motion compensation |
US17/894,309 US11924440B2 (en) | 2018-02-05 | 2022-08-24 | Techniques of multi-hypothesis motion compensation |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201862625547P | 2018-02-02 | 2018-02-02 | |
US201862626276P | 2018-02-05 | 2018-02-05 | |
US16/257,904 US20190246114A1 (en) | 2018-02-02 | 2019-01-25 | Techniques of multi-hypothesis motion compensation |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/879,007 Division US11463707B2 (en) | 2018-02-02 | 2020-05-20 | Techniques of multi-hypothesis motion compensation |
Publications (1)
Publication Number | Publication Date |
---|---|
US20190246114A1 true US20190246114A1 (en) | 2019-08-08 |
Family
ID=65409537
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/257,904 Abandoned US20190246114A1 (en) | 2018-02-02 | 2019-01-25 | Techniques of multi-hypothesis motion compensation |
US16/879,007 Active 2039-01-26 US11463707B2 (en) | 2018-02-02 | 2020-05-20 | Techniques of multi-hypothesis motion compensation |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/879,007 Active 2039-01-26 US11463707B2 (en) | 2018-02-02 | 2020-05-20 | Techniques of multi-hypothesis motion compensation |
Country Status (3)
Country | Link |
---|---|
US (2) | US20190246114A1 (en) |
CN (1) | CN112236995B (en) |
WO (1) | WO2019152283A1 (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10904549B2 (en) * | 2018-12-13 | 2021-01-26 | Tencent America LLC | Method and apparatus for signaling of multi-hypothesis for skip and merge mode and signaling of distance offset table in merge with motion vector difference |
US11218721B2 (en) * | 2018-07-18 | 2022-01-04 | Mediatek Inc. | Method and apparatus of motion compensation bandwidth reduction for video coding system utilizing multi-hypothesis |
US20220201322A1 (en) * | 2020-12-23 | 2022-06-23 | Qualcomm Incorporated | Multiple hypothesis prediction for video coding |
CN114902667A (en) * | 2019-11-05 | 2022-08-12 | Lg电子株式会社 | Image or video coding based on chroma quantization parameter offset information |
CN115299061A (en) * | 2020-02-29 | 2022-11-04 | 抖音视界有限公司 | Signaling of syntax elements for reference picture indication |
WO2022242651A1 (en) * | 2021-05-17 | 2022-11-24 | Beijing Bytedance Network Technology Co., Ltd. | Method, device, and medium for video processing |
CN115988219A (en) * | 2020-01-12 | 2023-04-18 | 华为技术有限公司 | Method and apparatus for coordinated weighted prediction using non-rectangular fusion mode |
US20240195978A1 (en) * | 2022-12-13 | 2024-06-13 | Apple Inc. | Joint motion vector coding |
US20240205413A1 (en) * | 2021-04-19 | 2024-06-20 | Zte Corporation | Picture encoding method and apparatus, picture decoding method and apparatus, electronic device and storage medium |
US20240314343A1 (en) * | 2023-03-13 | 2024-09-19 | Tencent America LLC | Block based weighting factor for joint motion vector difference coding mode |
WO2025002201A1 (en) * | 2023-06-27 | 2025-01-02 | Douyin Vision Co., Ltd. | Method, apparatus, and medium for video processing |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022081878A1 (en) * | 2020-10-14 | 2022-04-21 | Beijing Dajia Internet Information Technology Co., Ltd. | Methods and apparatuses for affine motion-compensated prediction refinement |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110002388A1 (en) * | 2009-07-02 | 2011-01-06 | Qualcomm Incorporated | Template matching for video coding |
US20140003489A1 (en) * | 2012-07-02 | 2014-01-02 | Nokia Corporation | Method and apparatus for video coding |
US20160191920A1 (en) * | 2013-08-09 | 2016-06-30 | Samsung Electronics Co., Ltd. | Method and apparatus for determining merge mode |
Family Cites Families (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6807231B1 (en) * | 1997-09-12 | 2004-10-19 | 8×8, Inc. | Multi-hypothesis motion-compensated video image predictor |
US7003035B2 (en) | 2002-01-25 | 2006-02-21 | Microsoft Corporation | Video coding methods and apparatuses |
US20040001546A1 (en) * | 2002-06-03 | 2004-01-01 | Alexandros Tourapis | Spatiotemporal prediction for bidirectionally predictive (B) pictures and motion vector prediction for multi-picture reference motion compensation |
US8457203B2 (en) * | 2005-05-26 | 2013-06-04 | Ntt Docomo, Inc. | Method and apparatus for coding motion and prediction weighting parameters |
WO2007092192A2 (en) | 2006-02-02 | 2007-08-16 | Thomson Licensing | Method and apparatus for motion estimation using combined reference bi-prediction |
US8346000B2 (en) | 2007-08-01 | 2013-01-01 | The Board Of Trustees Of The Leland Stanford Junior University | Systems, methods, devices and arrangements for motion-compensated image processing and coding |
KR20090094595A (en) * | 2008-03-03 | 2009-09-08 | 삼성전자주식회사 | Method and appratus for encoding images using motion prediction by multiple reference, and method and apparatus for decoding images using motion prediction by multiple reference |
WO2010017166A2 (en) | 2008-08-04 | 2010-02-11 | Dolby Laboratories Licensing Corporation | Overlapped block disparity estimation and compensation architecture |
JPWO2010035733A1 (en) | 2008-09-24 | 2012-02-23 | ソニー株式会社 | Image processing apparatus and method |
WO2011005624A1 (en) | 2009-07-04 | 2011-01-13 | Dolby Laboratories Licensing Corporation | Encoding and decoding architectures for format compatible 3d video delivery |
US9118929B2 (en) | 2010-04-14 | 2015-08-25 | Mediatek Inc. | Method for performing hybrid multihypothesis prediction during video coding of a coding unit, and associated apparatus |
US8837592B2 (en) | 2010-04-14 | 2014-09-16 | Mediatek Inc. | Method for performing local motion vector derivation during video coding of a coding unit, and associated apparatus |
KR101782929B1 (en) * | 2010-05-26 | 2017-09-28 | 엘지전자 주식회사 | Method and apparatus for processing a video signal |
EP2596636A1 (en) | 2010-07-21 | 2013-05-29 | Dolby Laboratories Licensing Corporation | Reference processing using advanced motion models for video coding |
ES2859635T3 (en) * | 2010-10-08 | 2021-10-04 | Ge Video Compression Llc | Image encoding that supports block partitioning and block merging |
US9143799B2 (en) * | 2011-05-27 | 2015-09-22 | Cisco Technology, Inc. | Method, apparatus and computer program product for image motion prediction |
WO2013006573A1 (en) * | 2011-07-01 | 2013-01-10 | General Instrument Corporation | Joint sub-pixel interpolation filter for temporal prediction |
KR20140057373A (en) | 2011-08-30 | 2014-05-12 | 노키아 코포레이션 | An apparatus, a method and a computer program for video coding and decoding |
WO2013059470A1 (en) | 2011-10-21 | 2013-04-25 | Dolby Laboratories Licensing Corporation | Weighted predictions based on motion information |
US20130176390A1 (en) | 2012-01-06 | 2013-07-11 | Qualcomm Incorporated | Multi-hypothesis disparity vector construction in 3d video coding with depth |
US9635356B2 (en) | 2012-08-07 | 2017-04-25 | Qualcomm Incorporated | Multi-hypothesis motion compensation for scalable video coding and 3D video coding |
WO2015010317A1 (en) | 2013-07-26 | 2015-01-29 | 北京大学深圳研究生院 | P frame-based multi-hypothesis motion compensation method |
US9854246B2 (en) | 2014-02-28 | 2017-12-26 | Apple Inc. | Video encoding optimization with extended spaces |
CN107211161B (en) | 2015-03-10 | 2020-05-15 | 苹果公司 | Video coding optimization of extended space including last stage processing |
US20160360234A1 (en) | 2015-06-03 | 2016-12-08 | Apple Inc. | Techniques For Resource Conservation During Performance Of Intra Block Copy Prediction Searches |
WO2017130696A1 (en) * | 2016-01-29 | 2017-08-03 | シャープ株式会社 | Prediction image generation device, moving image decoding device, and moving image encoding device |
EP3456049B1 (en) * | 2016-05-13 | 2022-05-04 | VID SCALE, Inc. | Systems and methods for generalized multi-hypothesis prediction for video coding |
US11153594B2 (en) | 2016-08-29 | 2021-10-19 | Apple Inc. | Multidimensional quantization techniques for video coding/decoding systems |
CN110999288A (en) * | 2017-08-22 | 2020-04-10 | 索尼公司 | Image processor and image processing method |
US10986343B2 (en) * | 2018-04-15 | 2021-04-20 | Arris Enterprises Llc | Reducing overhead for multiple-hypothesis temporal prediction |
WO2020015706A1 (en) * | 2018-07-18 | 2020-01-23 | Mediatek Inc. | Method and apparatus of motion compensation bandwidth reduction for video coding system utilizing multi-hypothesis |
-
2019
- 2019-01-25 CN CN201980011338.8A patent/CN112236995B/en active Active
- 2019-01-25 US US16/257,904 patent/US20190246114A1/en not_active Abandoned
- 2019-01-25 WO PCT/US2019/015247 patent/WO2019152283A1/en active Application Filing
-
2020
- 2020-05-20 US US16/879,007 patent/US11463707B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110002388A1 (en) * | 2009-07-02 | 2011-01-06 | Qualcomm Incorporated | Template matching for video coding |
US20140003489A1 (en) * | 2012-07-02 | 2014-01-02 | Nokia Corporation | Method and apparatus for video coding |
US20160191920A1 (en) * | 2013-08-09 | 2016-06-30 | Samsung Electronics Co., Ltd. | Method and apparatus for determining merge mode |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11218721B2 (en) * | 2018-07-18 | 2022-01-04 | Mediatek Inc. | Method and apparatus of motion compensation bandwidth reduction for video coding system utilizing multi-hypothesis |
US11917185B2 (en) | 2018-07-18 | 2024-02-27 | Hfi Innovation Inc. | Method and apparatus of motion compensation bandwidth reduction for video coding system utilizing multi-hypothesis |
US10904549B2 (en) * | 2018-12-13 | 2021-01-26 | Tencent America LLC | Method and apparatus for signaling of multi-hypothesis for skip and merge mode and signaling of distance offset table in merge with motion vector difference |
US12225216B2 (en) | 2019-11-05 | 2025-02-11 | Lg Electronics Inc. | Image or video coding based on chroma quantization parameter offset information |
CN114902667A (en) * | 2019-11-05 | 2022-08-12 | Lg电子株式会社 | Image or video coding based on chroma quantization parameter offset information |
CN115988219A (en) * | 2020-01-12 | 2023-04-18 | 华为技术有限公司 | Method and apparatus for coordinated weighted prediction using non-rectangular fusion mode |
US12075045B2 (en) | 2020-01-12 | 2024-08-27 | Huawei Technologies Co., Ltd. | Method and apparatus of harmonizing weighted prediction with non-rectangular merge modes |
CN115299061A (en) * | 2020-02-29 | 2022-11-04 | 抖音视界有限公司 | Signaling of syntax elements for reference picture indication |
US12316878B2 (en) | 2020-02-29 | 2025-05-27 | Beijing Bytedance Network Technology Co., Ltd. | Reference picture information signaling in a video bitstream |
US20220201322A1 (en) * | 2020-12-23 | 2022-06-23 | Qualcomm Incorporated | Multiple hypothesis prediction for video coding |
US20240205413A1 (en) * | 2021-04-19 | 2024-06-20 | Zte Corporation | Picture encoding method and apparatus, picture decoding method and apparatus, electronic device and storage medium |
WO2022242651A1 (en) * | 2021-05-17 | 2022-11-24 | Beijing Bytedance Network Technology Co., Ltd. | Method, device, and medium for video processing |
US20240195978A1 (en) * | 2022-12-13 | 2024-06-13 | Apple Inc. | Joint motion vector coding |
US12238317B2 (en) * | 2023-03-13 | 2025-02-25 | Tencent America LLC | Block based weighting factor for joint motion vector difference coding mode |
US20240314343A1 (en) * | 2023-03-13 | 2024-09-19 | Tencent America LLC | Block based weighting factor for joint motion vector difference coding mode |
WO2025002201A1 (en) * | 2023-06-27 | 2025-01-02 | Douyin Vision Co., Ltd. | Method, apparatus, and medium for video processing |
Also Published As
Publication number | Publication date |
---|---|
CN112236995A (en) | 2021-01-15 |
CN112236995B (en) | 2024-08-06 |
US20200304807A1 (en) | 2020-09-24 |
US11463707B2 (en) | 2022-10-04 |
WO2019152283A1 (en) | 2019-08-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11463707B2 (en) | Techniques of multi-hypothesis motion compensation | |
AU2017340631B2 (en) | Motion vector prediction for affine motion models in video coding | |
EP3443748B1 (en) | Conformance constraint for collocated reference index in video coding | |
US11924440B2 (en) | Techniques of multi-hypothesis motion compensation | |
US12273532B2 (en) | Method and apparatus for prediction refinement with optical flow for an affine coded block | |
US12132911B2 (en) | BVD sign inference in IBC based on BV and BVP components | |
KR20210129721A (en) | Method, device, and system for determining prediction weights for merge mode | |
US11895302B2 (en) | Adaptive bilateral matching for decoder side motion vector refinement | |
US11336913B2 (en) | Reference picture re-sampling | |
US20230300363A1 (en) | Systems and methods for template matching for adaptive mvd resolution | |
US20240364870A1 (en) | Boundary Based Asymmetric Reference Line Offsets | |
US20240364864A1 (en) | Method for video encoding/decoding and bitstream transmission | |
US20230362402A1 (en) | Systems and methods for bilateral matching for adaptive mvd resolution | |
WO2024017378A9 (en) | Method, apparatus, and medium for video processing | |
WO2024182669A1 (en) | Template-based weight derivation for block prediction | |
KR20230133775A (en) | Method and apparatus for encoding/decoding image and recording medium for storing bitstream | |
US20240195978A1 (en) | Joint motion vector coding | |
WO2024213142A1 (en) | Method, apparatus, and medium for video processing | |
WO2024078550A1 (en) | Method, apparatus, and medium for video processing | |
WO2024222825A1 (en) | Method, apparatus, and medium for video processing | |
WO2024146616A1 (en) | Method, apparatus, and medium for video processing | |
WO2024235331A1 (en) | Method, apparatus, and medium for video processing | |
WO2024032671A9 (en) | Method, apparatus, and medium for video processing | |
WO2024213018A1 (en) | Method, apparatus, and medium for video processing | |
WO2024114651A9 (en) | Method, apparatus, and medium for video processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: APPLE INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:TOURAPIS, ALEXANDROS MICHAEL;SU, YEPING;SINGER, DAVID;AND OTHERS;SIGNING DATES FROM 20190122 TO 20190213;REEL/FRAME:048453/0324 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |