CN117223283A - Intra-mode intra-phase Guan Duochong transform selection for video coding - Google Patents
Intra-mode intra-phase Guan Duochong transform selection for video coding Download PDFInfo
- Publication number
- CN117223283A CN117223283A CN202280026724.6A CN202280026724A CN117223283A CN 117223283 A CN117223283 A CN 117223283A CN 202280026724 A CN202280026724 A CN 202280026724A CN 117223283 A CN117223283 A CN 117223283A
- Authority
- CN
- China
- Prior art keywords
- mts
- block
- intra
- mode
- current block
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims abstract description 96
- 230000015654 memory Effects 0.000 claims abstract description 53
- 238000000034 method Methods 0.000 claims description 232
- 101150089388 dct-5 gene Proteins 0.000 claims description 52
- 101100278585 Dictyostelium discoideum dst4 gene Proteins 0.000 claims description 42
- 101150090341 dst1 gene Proteins 0.000 claims description 42
- 238000013139 quantization Methods 0.000 claims description 34
- 239000011159 matrix material Substances 0.000 claims description 28
- 230000004927 fusion Effects 0.000 claims description 24
- 238000009795 derivation Methods 0.000 claims description 18
- 230000007704 transition Effects 0.000 claims description 10
- 238000012545 processing Methods 0.000 description 68
- 241000023320 Luma <angiosperm> Species 0.000 description 25
- OSWPMRLSEDHDFF-UHFFFAOYSA-N methyl salicylate Chemical compound COC(=O)C1=CC=CC=C1O OSWPMRLSEDHDFF-UHFFFAOYSA-N 0.000 description 25
- 238000003860 storage Methods 0.000 description 25
- 239000013598 vector Substances 0.000 description 25
- 230000006870 function Effects 0.000 description 18
- 230000008569 process Effects 0.000 description 18
- 238000000638 solvent extraction Methods 0.000 description 17
- 238000004891 communication Methods 0.000 description 16
- 238000005192 partition Methods 0.000 description 16
- 238000010586 diagram Methods 0.000 description 12
- 230000005540 biological transmission Effects 0.000 description 9
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 8
- 230000011664 signaling Effects 0.000 description 7
- 230000003044 adaptive effect Effects 0.000 description 6
- 239000011449 brick Substances 0.000 description 6
- 238000013500 data storage Methods 0.000 description 6
- 101150114515 CTBS gene Proteins 0.000 description 5
- 238000013507 mapping Methods 0.000 description 5
- 238000003491 array Methods 0.000 description 4
- 239000000872 buffer Substances 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 238000012546 transfer Methods 0.000 description 4
- 230000009466 transformation Effects 0.000 description 4
- 238000012935 Averaging Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 230000000903 blocking effect Effects 0.000 description 3
- 230000006835 compression Effects 0.000 description 3
- 238000007906 compression Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000002123 temporal effect Effects 0.000 description 3
- 230000002457 bidirectional effect Effects 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 2
- 239000002131 composite material Substances 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000010267 cellular communication Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005315 distribution function Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000003116 impacting effect Effects 0.000 description 1
- 238000012432 intermediate storage Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Landscapes
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
An example apparatus for decoding video data includes: a memory configured to store video data; and one or more processors implemented in the circuitry and configured to: determining a size of a current block of video data; determining an intra prediction mode for a current block of video data; determining a mode group including the determined intra prediction mode, the mode group being one of a plurality of mode groups, each mode group including a respective set of intra prediction modes; determining a set of available Multiple Transform Selection (MTS) schemes for the current block according to a size of the current block and an intra prediction mode; determining an MTS scheme from a set of available MTS schemes based on the determined pattern groups; applying a transform of the MTS scheme to a transform block of the current block to generate a residual block of the current block; and decoding the current block using the residual block.
Description
The present application claims the benefit of U.S. patent application Ser. No.17/658,803, filed on 11 at 4 months 2022, and U.S. provisional application Ser. No.63/173,884, filed on 12 at 4 months 2021, and U.S. provisional application Ser. No.63/223,377, filed on 19 at 7 months 2021, each of which is incorporated herein by reference in its entirety. U.S. patent application Ser. No.17/658,803, filed on 11 at 4 at 2022, claims the benefit of U.S. provisional application Ser. No.63/173,884, filed on 12 at 4 at 2021, and U.S. provisional application Ser. No.63/223,377, filed on 19 at 7 at 2021.
Technical Field
The present disclosure relates to video coding, including video encoding and video decoding.
Background
Digital video capabilities can be incorporated into a wide variety of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, personal Digital Assistants (PDAs), laptop or desktop computers, tablet computers, electronic book readers, digital cameras, digital recording devices, digital media players, video gaming devices, video gaming consoles, cellular or satellite radio telephones (so-called "smartphones"), video teleconferencing devices, video streaming devices, and the like. Digital video devices implement video coding techniques such as those described in the standards defined by MPEG-2, MPEG-4, ITU-t h.263, ITU-t h.264/MPEG-4 (part 10, advanced Video Coding (AVC)), ITU-t h.265/High Efficiency Video Coding (HEVC), ITU-t h.266/universal video coding (VVC), and extensions of such standards, as well as proprietary video codecs/formats such as AOMediaVideo1 (AV 1) developed by the open media alliance. By implementing such video coding techniques, a video device may more efficiently transmit, receive, encode, decode, and/or store digital video information.
Video coding techniques include spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or eliminate redundancy inherent in video sequences. For block-based video coding, a video slice (e.g., a video picture or a portion of a video picture) may be divided into video blocks, which may also be referred to as Coding Tree Units (CTUs), coding Units (CUs), and/or coding nodes. Video blocks in slices of an intra coded (I) picture are encoded using spatial prediction relative to reference samples in neighboring blocks in the same picture. Video blocks in a slice of an inter-coded (P or B) picture may use spatial prediction with respect to reference samples in neighboring blocks in the same picture or temporal prediction with respect to reference samples in other reference pictures. A picture may be referred to as a frame and a reference picture may be referred to as a reference frame.
Disclosure of Invention
In general, this disclosure describes techniques for selecting Multiple Transform Selection (MTS) schemes for video coding. A video coder may divide a picture into multiple blocks and individually code each block. Coding typically includes forming a prediction block according to a prediction mode and coding a residual block, where the residual block represents a difference between the prediction block and an actual block. The video encoder may apply a transform to the residual block and the video decoder may apply an inverse transform to the transformed block to reproduce the residual block. The MTS scheme includes a variety of transforms, including horizontal transforms and vertical transforms, applied during residual block coding. In accordance with the techniques of this disclosure, a video coder may be configured to select an MTS scheme based on a block size and an intra prediction mode for the block.
In some examples, a video coder may determine the MTS scheme from a size group including a block size. For example, the size group may be a range of block sizes. Video decoders may be configured with a variety of different size groups, each corresponding to a different MTS scheme. Additionally or alternatively, in some examples, the video coder may determine the MTS scheme from a mode group including an intra-prediction mode for the current block. For example, the mode group may be a set of intra prediction modes. A video coder may be configured with a variety of different mode groups, each mode group corresponding to a different MTS scheme. In some examples, the video coder may apply size symmetry to select the MTS scheme. For example, blocks of size MxN (where M and N are non-equal integer values) and predicted using a directional intra-prediction mode may be mapped to an MTS scheme, and the video coder may be configured to select the same MTS scheme for NxM blocks predicted using a symmetrical directional intra-prediction mode.
In one example, a method of decoding video data includes: determining a size of a current block of video data; determining an intra-prediction mode for the current block of video data; determining a mode group including the determined intra prediction mode, the mode group being one of a plurality of mode groups, each of the mode groups of the plurality of mode groups including a respective set of intra prediction modes such that each possible intra prediction mode is not included in more than one of the mode groups; determining an available Multiple Transform Selection (MTS) scheme set for the current block according to the size of the current block and the intra-prediction mode, the available MTS scheme set being one of a plurality of MTS scheme sets; determining an MTS scheme from the set of available MTS schemes based on the determined pattern groups; applying a transform of the MTS scheme to a transform block of the current block to generate a residual block of the current block; and decoding the current block using the residual block.
In another example, an apparatus for decoding (and possibly encoding as well) video data may comprise: a memory configured to store video data; and one or more processors implemented in the circuitry and configured to: determining a size of a current block of video data; determining an intra-prediction mode for the current block of video data; determining a mode group including the determined intra prediction mode, the mode group being one of a plurality of mode groups, each of the mode groups of the plurality of mode groups including a respective set of intra prediction modes such that each possible intra prediction mode is not included in more than one of the mode groups; determining an available Multiple Transform Selection (MTS) scheme set for the current block according to the size of the current block and the intra-prediction mode, the available MTS scheme set being one of a plurality of MTS scheme sets; determining an MTS scheme from the set of available MTS schemes based on the determined pattern groups; applying a transform of the MTS scheme to a transform block of the current block to generate a residual block of the current block; and decoding the current block using the residual block.
In another example, a computer-readable storage medium has instructions stored thereon that, when executed, cause a processor of a device for decoding video data to: determining a size of a current block of video data; determining an intra-prediction mode for the current block of video data; determining a mode group including the determined intra prediction mode, the mode group being one of a plurality of mode groups, each of the mode groups of the plurality of mode groups including a respective set of intra prediction modes such that each possible intra prediction mode is not included in more than one of the mode groups; determining an available Multiple Transform Selection (MTS) scheme set for the current block according to the size of the current block and the intra-prediction mode, the available MTS scheme set being one of a plurality of MTS scheme sets; determining an MTS scheme from the set of available MTS schemes based on the determined pattern groups; applying a transform of the MTS scheme to a transform block of the current block to generate a residual block of the current block; and decoding the current block using the residual block.
In another example, an apparatus for decoding (and possibly encoding) video data includes: means for determining a size of a current block of video data; means for determining an intra prediction mode for the current block of video data; means for determining a mode group comprising the determined intra prediction modes, the mode group being one of a plurality of mode groups, each of the mode groups comprising a respective set of intra prediction modes, such that each possible intra prediction mode is not included in more than one of the mode groups; means for determining a set of available Multiple Transform Selection (MTS) schemes for the current block based on the size of the current block and the intra-prediction mode, the set of available MTS schemes being one of a plurality of sets of MTS schemes; means for determining an MTS scheme from the set of available MTS schemes based on the determined pattern groups; means for applying a transform of the MTS scheme to a transform block of the current block to generate a residual block of the current block; and means for decoding the current block using the residual block.
The details of one or more examples are set forth in the accompanying drawings and the description below. Other features, objects, and advantages will be apparent from the description and drawings, and from the claims.
Drawings
Fig. 1 is a block diagram illustrating an example video encoding and decoding system that may perform the techniques of this disclosure.
Fig. 2 is a conceptual diagram illustrating conventional and wide-angle intra prediction modes.
Fig. 3 is a flow chart of an example of a Matrix Intra Prediction (MIP) procedure.
Fig. 4 is a conceptual diagram illustrating an example of constructing a histogram for gradient computation of decoder-side intra mode derivation and fusion intra prediction (DIMD).
FIG. 5 is a flow chart illustrating an example weight determination and prediction block generation process for a DIMD.
Fig. 6 is a conceptual diagram illustrating templates and reference samples for template-based intra-fusion mode derivation (TIMD).
Fig. 7 is a block diagram illustrating an example video encoder that may perform the techniques of this disclosure.
Fig. 8 is a block diagram illustrating an example video decoder that may perform the techniques of this disclosure.
Fig. 9 is a flowchart illustrating an example method for encoding a current block in accordance with the techniques of this disclosure.
Fig. 10 is a flowchart illustrating an example method for decoding a current block in accordance with the techniques of this disclosure.
Fig. 11 is a flow chart illustrating another example method of decoding a block of video data in accordance with the techniques of this disclosure.
Fig. 12 is a flowchart illustrating another example method of decoding a block of video data in accordance with the techniques of this disclosure.
Fig. 13 is a flowchart illustrating another example method of decoding a block of video data in accordance with the techniques of this disclosure.
Detailed Description
Video coding standards include ITU-T H.261, ISO/IECMPEG-1Visual, ITU-T H.262 or ISO/IECMPEG-2Visual, ITU-T H.263, ISO/IECMPEG-4Visual (MPEG-4 part 2), ITU-T H.264 (also known as ISO/IECMPEG-4 AVC), including Scalable Video Coding (SVC) and Multiview Video Coding (MVC) extensions thereof, and ITU-T H.265 (also known as ISO/IECMPEG-4 HEVC) and extensions thereof. During the joint video expert group (jfet) conference of month 4 of 2018, a universal video coding (VVC) standardization activity (also known as ITU-t h.26) began, and video compression techniques submitted in response to solicitation of a proposal were evaluated.
In general, this disclosure describes techniques for selecting Multiple Transform Selection (MTS) schemes for video coding. A video coder may divide a picture into multiple blocks and individually code each block. Coding typically includes forming a prediction block according to a prediction mode and coding a residual block, where the residual block represents a difference between the prediction block and an actual block. The video encoder may apply a transform to the residual block and the video decoder may apply an inverse transform to the transformed block to reproduce the residual block. The MTS scheme includes a variety of transforms, including horizontal transforms and vertical transforms, applied during residual block coding. In accordance with the techniques of this disclosure, a video coder may be configured to select an MTS scheme based on a block size and an intra prediction mode for the block.
Said et al, "CE6.1.1: extendedAMT", joint video expert group (JVET) of ITU-TSG16WP3 and ISO/IECJTC1/SC29/WG11, 11 th conference: the document No. jfet-K0375-v 2 (hereinafter "jfet-K0375") describes an example process for determining the MTS scheme using only the shortest sides of non-square blocks, from suldonia, 7 months 10-18, 2018. Thus, for example, for MTS determination purposes, 16x4 blocks and 4x4 blocks would be treated equally. However, statistically, even though these blocks use the same intra prediction mode, their corresponding residual characteristics may be different. Furthermore, matrix intra-prediction (MIP) modes may have different residual characteristics than directional intra-prediction modes. However, JHET-K0375 does not specify a different set of transitions for MIP mode. The present disclosure describes various techniques for selecting an MTS scheme that may utilize residual characteristics of blocks of various sizes that consider both horizontal and vertical directions in block size and also consider MIP modes as possible intra prediction modes. Thus, these techniques may improve video compression without negatively impacting video quality.
Fig. 1 is a block diagram illustrating an example video encoding and decoding system 100 that may perform the techniques of this disclosure. The techniques of this disclosure generally relate to coding (encoding and/or decoding) video data. Generally, video data includes any data used to process video. Thus, video data may include raw, uncoded video, encoded video, decoded (e.g., reconstructed) video, and video metadata, such as signaling data.
As shown in fig. 1, in this example, the system 100 includes a source device 102, the source device 102 providing encoded video data to be decoded and displayed by a destination device 116. In particular, the source device 102 provides video data to the destination device 116 via the computer readable medium 110. Source device 102 and destination device 116 may comprise any of a wide range of devices including desktop computers, notebook (i.e., laptop) computers, mobile devices, tablet computers, set-top boxes, telephone handsets such as smartphones, televisions, cameras, display devices, digital media players, video gaming terminals, video streaming devices, broadcast receiver devices, and the like. In some cases, the source device 102 and the destination device 116 may be equipped for wireless communication, and thus may be referred to as wireless communication devices.
In the example of fig. 1, source device 102 includes video source 104, memory 106, video encoder 200, and output interface 108. Destination device 116 includes input interface 122, video decoder 300, memory 120, and display device 118. In accordance with the present disclosure, the video encoder 200 of the source device 102 and the video decoder 300 of the target device 116 may be configured to apply techniques for determining a Multiple Transform Selection (MTS) scheme based on the size of the current block and the intra prediction mode. Thus, source device 102 represents an example of a video encoding device, and destination device 116 represents an example of a video decoding device. In other examples, the source device and the destination device may include other components or arrangements. For example, source device 102 may receive video data from an external video source, such as an external camera. Likewise, the destination device 116 may interface with an external display device instead of including an integrated display device.
The system 100 as shown in fig. 1 is only one example. In general, any digital video encoding and/or decoding apparatus may perform techniques for determining a Multiple Transform Selection (MTS) scheme based on the size of a current block and an intra prediction mode. Source device 102 and destination device 116 are merely examples of such transcoding devices in which source device 102 generates transcoded video data for transmission to destination device 116. The present disclosure refers to a "transcoding" device as a device that performs transcoding (e.g., encoding and/or decoding) of data. Thus, the video encoder 200 and the video decoder 300 represent examples of decoding apparatuses, and in particular, represent a video encoder and a video decoder, respectively. In some examples, source device 102 and destination device 116 may operate in a substantially symmetrical manner such that each of source device 102 and destination device 116 include video encoding and decoding components. Thus, the system 100 may support unidirectional or bidirectional video transmission between the source device 102 and the destination device 116, for example, for video streaming, video playback, video broadcasting, or video telephony.
In general, video source 104 represents a source of video data (i.e., original, non-coded video data) and provides a sequential series of pictures (also referred to as "frames") of the video data to video encoder 200, which video encoder 200 encodes the data for the pictures. The video source 104 of the source device 102 may include a video capture device such as a video camera, a video archiving unit containing previously captured raw video, and/or a video feed interface for receiving video from a video content provider. As a further alternative, video source 104 may generate computer graphics-based data as the source video, or a combination of real-time video, archived video, and computer-generated video. In each case, the video encoder 200 encodes captured, pre-captured, or computer-generated video data. Video encoder 200 may rearrange pictures from the received order (sometimes referred to as the "display order") to a coding order for coding. The video encoder 200 may generate a bitstream including the encoded video data. The source device 102 may then output the encoded video data onto the computer readable medium 110 via the output interface 108 for receipt and/or retrieval by an input interface 122, such as the destination device 116.
The memory 106 of the source device 102 and the memory 120 of the destination device 116 represent general purpose memory. In some examples, the memories 106, 120 may store raw video data, e.g., raw video from the video source 104 and raw decoded video data from the video decoder 300. Additionally or alternatively, the memories 106, 120 may store software instructions executable by, for example, the video encoder 200 and the video decoder 300, respectively. Although memory 106 and memory 120 are shown separately from video encoder 200 and video decoder 300 in this example, it should be understood that video encoder 200 and video decoder 300 may also include internal memory for functionally similar or equivalent purposes. Furthermore, the memories 106, 120 may store encoded video data, e.g., output from the video encoder 200 and input to the video decoder 300. In some examples, portions of the memory 106, 120 may be allocated as one or more video buffers, e.g., to store raw decoded and/or encoded video data.
Computer-readable medium 110 may represent any type of medium or device capable of transmitting encoded video data from source device 102 to destination device 116. In one example, the computer-readable medium 110 represents a communication medium that enables the source device 102 to directly transmit encoded video data to the destination device 116 in real-time, e.g., via a radio frequency network or a computer-based network. The output interface 108 may modulate a transmission signal including encoded video data and the input interface 122 may demodulate a received transmission signal according to a communication standard such as a wireless communication protocol. The communication medium may include any wireless or wired communication medium such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet-based network such as: a local area network, a wide area network, or a global network such as the internet. The communication medium may include a router, switch, base station, or any other device that may be useful for facilitating communication from the source device 102 to the destination device 116.
In some examples, source device 102 may output encoded data from output interface 108 to storage device 112. Similarly, destination device 116 may access encoded data from storage device 112 via input interface 122. Storage device 112 may include any of a variety of distributed or locally accessed data storage media such as hard drives, blu-ray discs, DVDs, CD-ROMs, flash memory, volatile or non-volatile memory, or any other suitable digital storage media for storing encoded video data.
In some examples, source device 102 may output the encoded video data to file server 114 or another intermediate storage device that may store the encoded video data generated by source device 102. The destination device 116 may access the stored video data from the file server 114 via streaming or download.
File server 114 may be any type of server device capable of storing encoded video data and transmitting the encoded video data to destination device 116. File server 114 may represent a web server (e.g., for a website), a server configured to provide file transfer protocol services (e.g., file Transfer Protocol (FTP) or file delivery over unidirectional transport (FLUTE) protocol), a Content Delivery Network (CDN) device, a hypertext transfer protocol (HTTP) server, a Multimedia Broadcast Multicast Service (MBMS) or enhanced MBMS (eMBMS) server, and/or a Network Attached Storage (NAS) device. The file server 114 may additionally or alternatively implement one or more HTTP streaming protocols, such as dynamic adaptive streaming over HTTP (DASH), live streaming over HTTP (HLS), real-time streaming protocol (RTSP), dynamic streaming over HTTP, and the like.
The destination device 116 may access the encoded video data from the file server 114 over any standard data connection, including an internet connection. This may include a wireless channel (e.g., wi-Fi connection), a wired connection (e.g., digital Subscriber Line (DSL), cable modem, etc.), or a combination of both suitable for accessing encoded video data stored on file server 114. The input interface 122 may be configured to operate in accordance with any one or more of the various protocols discussed above for retrieving or receiving media data from the file server 114 or other such protocols for retrieving media data.
Output interface 108 and input interface 122 may represent a wireless transmitter/receiver, a modem, a wired networking component (e.g., an ethernet card), a wireless communication component operating according to any of a variety of IEEE802.11 standards, or other physical components. In examples where output interface 108 and input interface 122 include wireless components, output interface 108 and input interface 122 may be configured to transmit data (e.g., encoded video data) according to a cellular communication standard (e.g., 4G-LTE (long term evolution), LTE-advanced, 5G, etc.). In some examples where output interface 108 includes a wireless transmitter, output interface 108 and input interface 122 may be configured to communicate in accordance with other wireless standards (such as the IEEE802.11 specification, the IEEE802.15 specification (e.g., zigBee TM )、Bluetooth TM Standard, etc.) to transmit data (e.g., encoded video data). In some examples, source device 102 and/or destination device 116 may include respective system-on-chip (SoC) devices. For example, source device 102 may include SoC devices for performing functions attributed to video encoder 200 and/or output interface 108, and destination device 116 may include SoC devices for performing functions attributed to video decoder 300 and/or input interface 122.
The techniques of this disclosure may be applied to video coding to support any of a variety of multimedia applications, such as over-the-air television broadcasting, cable television transmission, satellite television transmission, internet streaming video transmission (such as dynamic adaptive streaming over HTTP (DASH)), digital video encoded onto a data storage medium, decoding of digital video stored on a data storage medium, or other applications.
The input interface 122 of the destination device 116 receives the encoded video bitstream from the computer readable medium 110 (e.g., communication medium, storage device 112, file server 114, etc.). The encoded video bitstream may include signaling information (which is also used by the video decoder 300) defined by the video encoder 200 such as the following syntax elements: the syntax elements have values that describe characteristics and/or processing of a video block or other coding unit (e.g., slice, picture, group of pictures, sequence, etc.). The display device 118 displays the decoded pictures of the decoded video data to a user. Display device 118 may represent any of a variety of display devices, such as a Liquid Crystal Display (LCD), a plasma display, an Organic Light Emitting Diode (OLED) display, or another type of display device.
Although not shown in fig. 1, in some examples, both the video encoder 200 and the video decoder 300 may be integrated with an audio encoder and/or an audio decoder, and may include appropriate MUX-DEMUX units or other hardware and/or software to process multiplexed streams that include both audio and video in a common data stream.
Video encoder 200 and video decoder 300 may each be implemented as any of a variety of suitable encoder and/or decoder circuits, such as one or more microprocessors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), discrete logic, software, hardware, firmware, or any combinations thereof. When the techniques are implemented in part in software, a device may store instructions for the software in a suitable non-transitory computer-readable medium and execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Each of the video encoder 200 and the video decoder 300 may be included in one or more encoders or decoders, any of which may be integrated as part of a combined encoder/decoder (CODEC) in the respective device. Devices including video encoder 200 and/or video decoder 300 may include integrated circuits, microprocessors, and/or wireless communication devices (such as cellular telephones).
Video encoder 200 and video decoder 300 may operate in accordance with a video coding standard, such as ITU-t h.265 (also known as High Efficiency Video Coding (HEVC)) or an extension thereto, such as a multiview and/or scalable video coding extension. Alternatively, the video encoder 200 and video decoder 300 may operate in accordance with other proprietary or industry standards, such as ITU-t h.266, also known as universal video coding (VVC). In other examples, the video encoder 200 and video decoder 300 may operate in accordance with proprietary video codecs/formats, such as AOMediavideo1 (AV 1), extensions of AVI, and/or subsequent versions of AV1 (e.g., AV 2). In other examples, video encoder 200 and video decoder 300 may operate in accordance with other proprietary formats or industry standards. However, the techniques of this disclosure are not limited to any particular coding standard or format. In general, video encoder 200 and video decoder 300 may be configured to perform the techniques of this disclosure in conjunction with any video coding technique that determines a Multiple Transform Selection (MTS) scheme based on the current block size and intra-prediction mode.
In general, video encoder 200 and video decoder 300 may perform block-based coding of pictures. The term "block" generally refers to a structure that includes data to be processed (e.g., to be encoded, decoded, or otherwise used in an encoding and/or decoding process). For example, a block may comprise a two-dimensional matrix of samples of luminance and/or chrominance data. In general, video encoder 200 and video decoder 300 may decode video data represented in YUV (e.g., Y, cb, cr) format. That is, instead of coding red, green, and blue (RGB) data for samples of a picture, the video encoder 200 and the video decoder 300 may code luminance and chrominance components, where the chrominance components may include both red-hue and blue-hue chrominance components. In some examples, the video encoder 200 converts the received RGB formatted data to a YUV representation prior to encoding, and the video decoder 300 converts the YUV representation to RGB format. Alternatively, the conversion may be performed by a preprocessing and post-processing unit (not shown).
The present disclosure may generally relate to coding (e.g., encoding and decoding) of a picture to include a process of encoding or decoding data of the picture. Similarly, the present disclosure may relate to coding a block of a picture to include a process of encoding or decoding (e.g., prediction and/or residual coding) data for the block. The encoded video bitstream typically includes a series of values for syntax elements representing coding decisions (e.g., coding modes) and picture-to-block partitioning. Thus, references to coding of a picture or block should generally be understood as coding values of syntax elements forming the picture or block.
HEVC defines various blocks, including Coding Units (CUs), prediction Units (PUs), and Transform Units (TUs). According to HEVC, a video coder, such as video encoder 200, partitions Coding Tree Units (CTUs) into CUs according to a quadtree structure. That is, the video coder divides the CTUs and CUs into four equal, non-overlapping squares, and each node of the quadtree has zero or four child nodes. A node without child nodes may be referred to as a "leaf node," and a CU of such a leaf node may include one or more PUs and/or one or more TUs. The video coder may further partition the PU and the TU. For example, in HEVC, a Residual Quadtree (RQT) represents a partition of TUs. In HEVC, PUs represent inter prediction data, while TUs represent residual data. The intra-predicted CU includes intra-prediction information, such as an intra-mode indication.
As another example, the video encoder 200 and the video decoder 300 may be configured to operate according to VVC. According to VVC, a video coder, such as video encoder 200, divides a picture into a plurality of Coding Tree Units (CTUs). The video encoder 200 may divide the CTUs according to a tree structure, such as a quadtree-binary tree (QTBT) structure or a multi-type tree (MTT) structure. The QTBT structure removes the concept of multiple partition types, such as the separation between CUs, PUs, and TUs of HEVC. The QTBT structure includes two levels: a first level of partitioning according to a quadtree partitioning, and a second level of partitioning according to a binary tree partitioning. The root node of the QTBT structure corresponds to the CTU. Leaf nodes of the binary tree correspond to Coding Units (CUs).
In the MTT partitioning structure, the blocks may be partitioned using a Quadtree (QT) partition, a Binary Tree (BT) partition, and one or more types of Trigeminal Tree (TT) (also referred to as Ternary Tree (TT)) partitions. A trigeminal or ternary tree partition is a partition in which a block is divided into three sub-blocks. In some examples, a trigeminal or ternary tree partition divides a block into three sub-blocks, rather than dividing the original block through the center. The partition types (e.g., QT, BT, and TT) in MTT may be symmetrical or asymmetrical.
When operating according to the AV1 codec, the video encoder 200 and the video decoder 300 may be configured to code video data in blocks. In AV1, the largest decoding block that can be processed is called a super block. In AV1, the superblock may be 128x128 luma samples or 64x64 luma samples. However, in a subsequent video coding format (e.g., AV 2), the super block may be defined by a different (e.g., larger) luma sample size. In some examples, the superblock is the top level of the block quadtree. Video encoder 200 may further divide the super block into smaller coding blocks. The video encoder 200 may use square or non-square partitioning to partition super blocks and other coding blocks into smaller blocks. Non-square blocks may include N/2xN, nxN/2, N/4xN, and NxN/4 blocks. The video encoder 200 and the video decoder 300 may perform separate prediction and transform processes for each coded block.
AV1 also defines tiles of video data. A tile is a rectangular array of super blocks that can be coded independently of other tiles. That is, video encoder 200 and video decoder 300 may encode and decode, respectively, the coding blocks within a tile without using video data from other tiles. However, video encoder 200 and video decoder 300 may perform filtering across tile boundaries. The tiles may be uniform in size or non-uniform in size. Tile-based coding may enable parallel processing and/or multithreading for encoder and decoder implementations.
In some examples, the video encoder 200 and the video decoder 300 may use a single QTBT or MTT structure to represent each of the luma component and the chroma component, while in other examples, the video encoder 200 and the video decoder 300 may use two or more QTBT or MTT structures, such as one QTBT/MTT structure for the luma component and another QTBT/MTT structure for the two chroma components (or two QTBT/MTT structures for the respective chroma components).
The video encoder 200 and video decoder 300 may be configured to use quadtree partitioning, QTBT partitioning, MTT partitioning, superblock partitioning, or other partitioning structures.
In some examples, the CTUs include a Coding Tree Block (CTB) of luma samples, two corresponding CTBs of chroma samples of a picture having three sample arrays, or CTBs of monochrome pictures or samples of pictures coded using three separate color planes and syntax structures for coding the samples. CTBs may be blocks of NxN samples of some value of N such that dividing a component into CTBs is a partition. A component is an array or single sample from one of three arrays (one luminance and two chromaticities) that make up a picture in a 4:2:0, 4:2:2, or 4:4:4 color format, or an array or single sample of an array that makes up a picture in a monochrome format. In some examples, the coding block is a block of MxN samples for some values of M and N, such that dividing CTBs into coding blocks is a partition.
Blocks (e.g., CTUs or CUs) may be grouped in pictures in various ways. As one example, a brick (brick) may refer to a rectangular region of CTU rows within a particular tile in a picture. A tile may be a rectangular region of CTUs within a particular tile column and a particular tile row in a picture. A tile column refers to a rectangular region of a CTU having a height equal to the height of a picture and a width specified by syntax elements (e.g., such as in a picture parameter set). A tile row refers to a rectangular region of a CTU having a height specified by a syntax element (e.g., such as in a picture parameter set) and a width equal to the width of a picture.
In some examples, a tile may be divided into a plurality of bricks, each of which may include one or more rows of CTUs within the tile. A block that is not divided into a plurality of blocks may also be referred to as a block. However, bricks that are a true subset of tiles may not be referred to as tiles. The bricks in a picture may also be arranged in slices. A slice may be an integer number of tiles of a picture, which may be uniquely contained in a single Network Abstraction Layer (NAL) unit. In some examples, a slice includes a continuous sequence of multiple complete blocks or complete bricks of only one block.
The present disclosure may use "NxN" and "N by N" interchangeably to refer to the sample size of a block (such as a CU or other video block) in the vertical and horizontal dimensions, e.g., 16x16 samples or 16 by 16 samples. Typically, a 16x16CU will have 16 samples in the vertical direction (y=16) and 16 samples in the horizontal direction (x=16). Likewise, nxNCU typically has N samples in the vertical direction and N samples in the horizontal direction, where N represents a non-negative integer value. Samples in a CU may be arranged in rows and columns. Furthermore, a CU does not have to have the same number of samples in the horizontal direction as in the vertical direction. For example, a CU may include NxM samples, where M is not necessarily equal to N.
The video encoder 200 encodes video data representing prediction and/or residual information and other information for the CU. The prediction information indicates how the CU is to be predicted in order to form a prediction block of the CU. Residual information generally represents a sample-by-sample difference between samples of a CU before encoding and a prediction block.
To predict a CU, video encoder 200 may typically form a prediction block of the CU by inter-prediction or intra-prediction. Inter-prediction generally refers to predicting a CU from data of a previously coded picture, while intra-prediction generally refers to predicting a CU from previously coded data of the same picture. To perform inter prediction, video encoder 200 may use one or more motion vectors to generate a prediction block. Video encoder 200 may typically perform a motion search to identify reference blocks that closely match the CU, e.g., according to differences between the CU and the reference blocks. The video encoder 200 may calculate a difference metric using a Sum of Absolute Differences (SAD), a Sum of Squared Differences (SSD), a Mean Absolute Difference (MAD), a Mean Squared Difference (MSD), or other such difference calculation to determine whether the reference block closely matches the current CU. In some examples, video encoder 200 may use unidirectional prediction or bi-directional prediction to predict the current CU.
Some examples of VVCs also provide affine motion compensation modes, which may be considered inter prediction modes. In affine motion compensation mode, the video encoder 200 may determine two or more motion vectors representing non-translational motion (such as zoom-in or zoom-out, rotation, perspective motion, or other irregular types of motion).
To perform intra prediction, the video encoder 200 may select an intra prediction mode to generate a prediction block. Some examples of VVCs provide sixty-seven intra-prediction modes, including various directional modes, as well as planar modes and DC modes. In general, the video encoder 200 selects an intra prediction mode that describes neighboring samples of a current block (e.g., a block of a CU), from which samples of the current block are predicted. Assuming that video encoder 200 codes CTUs and CUs in raster scan order (left to right, top to bottom), such samples may typically be above, above left, or to the left of the current block in the same picture as the current block.
The video encoder 200 encodes data representing a prediction mode of the current block. For example, for inter prediction modes, video encoder 200 may encode data representing which of the various available inter prediction modes to use, as well as motion information for the corresponding modes. For example, for unidirectional or bi-directional inter prediction, the video encoder 200 may encode the motion vectors using Advanced Motion Vector Prediction (AMVP) or merge mode. The video encoder 200 may use a similar mode to encode motion vectors for affine motion compensation modes.
AV1 includes two general techniques for encoding and decoding coding blocks of video data. These two general techniques are intra-prediction (e.g., intra-prediction or spatial prediction) and inter-prediction (e.g., inter-prediction or temporal prediction). In the context of AV1, when a block of a current frame of video data is predicted using an intra-prediction coding mode, the video encoder 200 and the video decoder 300 do not use video data from other frames of video data. For most intra-prediction coding modes, the video encoder 200 encodes a block of the current frame based on the difference between sample values in the current block and prediction values generated from reference samples in the same frame. The video encoder 200 determines a prediction value generated from the reference samples based on the intra-prediction coding mode.
After prediction (such as intra-prediction or inter-prediction of a block), video encoder 200 may calculate residual data for the block. Residual data, such as a residual block, represents a sample-by-sample difference between the block and a prediction block for the block, which is formed using a corresponding prediction mode. The video encoder 200 may apply one or more transforms to the residual block to produce transformed data in the transform domain instead of in the sample domain. For example, video encoder 200 may apply a Discrete Cosine Transform (DCT), an integer transform, a wavelet transform, or a conceptually similar transform to the residual video data. In addition, the video encoder 200 may apply a secondary transform, such as a mode dependent inseparable secondary transform (MDNSST), a signal dependent transform, a Karhunen-Loeve transform (KLT), etc., after the first transform. In some examples, video encoder 200 and video decoder 300 may be configured to perform a Multiple Transform Selection (MTS) scheme, which may include applying both horizontal transforms and vertical transforms to blocks. The video encoder 200 generates transform coefficients after applying one or more transforms.
As noted above, after generating any transform of the transform coefficients, the video encoder 200 may perform quantization of the transform coefficients. Quantization generally refers to the process of: wherein the transform coefficients are quantized to possibly reduce the amount of data representing the transform coefficients, thereby providing further compression. By performing the quantization process, the video encoder 200 may reduce the bit depth associated with some or all of the transform coefficients. For example, the video encoder 200 may round the value of n bits down to a value of m bits during quantization, where n is greater than m. In some examples, to perform quantization, video encoder 200 may perform a bitwise right shift of the value to be quantized.
After quantization, the video encoder 200 may scan the transform coefficients to generate a one-dimensional vector from a two-dimensional matrix comprising quantized transform coefficients. The scan may be designed to place higher energy (and thus lower frequency) transform coefficients in front of the vector and lower energy (and thus higher frequency) transform coefficients in back of the vector. In some examples, video encoder 200 may scan quantized transform coefficients using a predefined scan order to generate a serialized vector, and then entropy encode the quantized transform coefficients of the vector. In other examples, video encoder 200 may perform adaptive scanning. After scanning the quantized transform coefficients to form a one-dimensional vector, video encoder 200 may entropy encode the one-dimensional vector, e.g., according to context-adaptive binary arithmetic coding (CABAC). The video encoder 200 may also entropy encode values of syntax elements that describe metadata associated with the encoded video data that was used by the video decoder 300 in decoding the video data.
To perform CABAC, the video encoder 200 may assign contexts within the context model to symbols to be transmitted. The context may relate to, for example, whether the adjacent value of the symbol is a zero value. The probability determination may be based on the context assigned to the symbol.
Video encoder 200 may further generate syntax data, such as block-based syntax data, picture-based syntax data, and sequence-based syntax data, for video decoder 300, such as in a picture header, a block header, a slice header, or other syntax data, such as a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), or a Video Parameter Set (VPS). The video decoder 300 may also decode such syntax data to determine how to decode the corresponding video data.
In this way, the video encoder 200 may generate a bitstream including encoded video data, e.g., syntax elements describing the partitioning of pictures into blocks (e.g., CUs) and prediction and/or residual information for the blocks. Finally, the video decoder 300 may receive the bitstream and decode the encoded video data.
In general, the video decoder 300 performs a process inverse to that performed by the video encoder 200 to decode encoded video data of a bitstream. For example, the video decoder 300 may use CABAC to decode values of syntax elements for the bitstream in a substantially similar, but opposite manner to the CABAC encoding process of the video encoder 200. The syntax element may define partition information for partitioning a picture into CTUs and partitioning each CTU according to a corresponding partition structure such as a QTBT structure to define CUs of the CTUs. Syntax elements may further define prediction and residual information for a block of video data (e.g., a CU).
The residual information may be represented by, for example, quantized transform coefficients. The video decoder 300 may inverse quantize and inverse transform the quantized transform coefficients of the block to reproduce a residual block for the block. The video decoder 300 uses the signaled prediction mode (intra prediction or inter prediction) and related prediction information (e.g., motion information for inter prediction) to form a prediction block for a block. The video decoder 300 may then combine the prediction block and the residual block (on a sample-by-sample basis) to reproduce the original block. The video decoder 300 may perform additional processing, such as performing a deblocking process to reduce visual artifacts along the boundaries of the blocks.
The present disclosure may generally relate to "signaling" certain information (such as syntax elements). The term "signaling" may generally refer to the transmission of values for syntax elements and/or other data for decoding encoded video data. That is, the video encoder 200 may signal the value of the syntax element in the bitstream. Typically, signaling refers to generating a value in the bitstream. As noted above, the source device 102 may stream the bits to the destination device 116 in substantially real-time or non-real-time (such as may occur when the syntax elements are stored to the storage device 112 for later retrieval by the destination device 116).
As described above, the video encoder 200 and the video decoder 300 may be configured to apply the MTS scheme to the current block. For example, the video encoder 200 may apply an MTS scheme (including horizontal transforms and vertical transforms) to the residual block, and the video decoder 300 may apply the MTS scheme to the transformed block to reconstruct the residual block. In accordance with the techniques of this disclosure, an MTS scheme may correspond to one available MTS scheme set, wherein video encoder 200 and video decoder 300 may select the available MTS scheme set from the multiple MTS scheme sets based on the size of the current block and the intra-prediction mode of the current block.
Fig. 2 is a conceptual diagram illustrating conventional and wide-angle intra prediction modes. To capture any edge direction presented in natural video, the number of directional intra modes in VTM5 extends from 33 used in HEVC to 65. The new directional mode in VVC is depicted in fig. 2, and the planar mode and DC mode remain the same as in HEVC. These dense directional intra prediction modes are applicable to all block sizes in VVC as well as both luma and chroma intra predictions.
The conventional (or "normal") angular intra prediction direction is defined in HEVC as ranging from 45 degrees to-135 degrees in the clockwise direction, corresponding to modes 2 through 66 in fig. 2. In order to provide better predictions for non-square blocks, angles exceeding 45 degrees to-135 degrees are considered in the VVC, as shown by pattern 67, 80 and pattern 1, -14 in fig. 2. These modes may be referred to as "wide angle" modes. For blocks with a width (W) greater than a height (H), the modes [67, 80] are considered, and for blocks with a width (W) less than a height (H), the modes [ -1, -14] are considered. These directional intra-prediction modes may be used in combination with Multiple Reference Lines (MRLs) or may be used in combination with intra-sub-split modes (ISPs). See J.Chen, Y.Ye, S.Kim, "Algorithm description for VersatileVideoCodingTestModel 10 (VTM 10)", 19 th JHET conference, conference call, month 7 in 2020, JHET-S2002 and B.Bross, J.Chen, S.Liu, "VersatileVideoCoding (Draft 10)", 19 th JHET conference, conference call, month 7 in 2020, JHET-S2001.
Fig. 3 is a flow chart illustrating an example of a Matrix Intra Prediction (MIP) procedure. The matrix weighted intra prediction (MIP) method is an intra prediction technique in VVC. In order to predict samples of rectangular block 129 of width W and height H, a video coder (e.g., video encoder 200 or video decoder 300) performing matrix weighted intra prediction (MIP) takes as input a row of H reconstructed neighboring boundary samples (samples 130B) to the left of block 129 and a row of reconstructed neighboring boundary samples (samples 130A) above block 129. If reconstructed samples are not available, the video coder generates values for them as in conventional intra prediction. The generation of the prediction signal is based on three steps-averaging, matrix vector multiplication and linear interpolation-as shown in fig. 3.
In particular, the video coder may average samples 130B to form average samples 132B and average samples 130A to form average samples 132A. The video coder may then perform matrix vector multiplication using the average samples 132A, 132B to form an intermediate prediction block 136. The video coder may then perform linear interpolation on the samples of the intermediate prediction block 136 to form a prediction block 138.
There are three different sizes of Id for the MIP procedure in VVC. VVC defines the index idx=idx (W, H) as follows:
for idx=0, 1 and 2, 16, 12 and 6 matrices are defined, respectively, which also define the number of modes for the given idx. Furthermore, each mode may be transposed in which samples from the left and above are swapped before performing matrix vector multiplication. Thus, in addition, when encoding a CU with MIP, the video coder may code a transpose flag (along with mode signaling) to indicate whether the mode is transposed.
Fig. 4 is a conceptual diagram illustrating an example of constructing histograms 140A, 140B for gradient calculations of decoder-side intra mode derivation and fusion intra prediction (DIMD). FIG. 5 is a flow chart illustrating an example weight determination and prediction block generation process for a DIMD. Abdoli et al: "Non-CE3: decoder-IntraModeDerivationwithPrectionFusionUsingPlanar", joint video expert group (JVET) 15 th conference of ITU-TSG16WP3 and ISO/IECJTC1/SC29/WG 11: the swedish goldburg, 7 th 3-12 days, 2019, document No. jfet-O0449-v 2, describes performing intra prediction based on decoder derived intra modes (using already decoded neighboring reconstructed samples) and fusing it with planar prediction samples. In JVET-O0449, the two angle modes are selected from a gradient histogram (HoG) that is calculated from neighboring pixels of the current block. Once the two angle modes are selected, their predictors are calculated using the conventional angle Intra Prediction Mode (IPM) and the final predictor of the block. The weight of the planar mode is kept at 21/64 (=1/3), and the rest of 43/64 is distributed proportionally to the two angular modes based on the corresponding amplitude in the HoG. The HoG is calculated by sliding a 3x3 window along the left and upper adjacent reconstructed samples as shown in fig. 4. The final prediction block 150 may be calculated using a weighted combination of prediction blocks formed of intra prediction modes M1, M2 and planar modes.
Fig. 6 is a conceptual diagram illustrating templates and reference samples for template-based intra-fusion mode derivation (TIMD). Wang et al, "EE2-related: template-based modificationnumberingMPMs," ITU-TSG16WP3 and Joint video expert group (JVET) of ISO/IECJTC1/SC29, 22 nd conference, by teleconferencing, 2021, 4 months 20-28 days, document No. JVET-V0098-V2, propose another decoder-side intra-mode derivation method, namely Template-based intra-mode derivation.
Fig. 6 depicts the general idea of TIMD. Given the current CU160, a video coder (e.g., video encoder 200 or video decoder 300) selects two template regions (e.g., above the current CU160 and to the left of the current CU 160), and selects reference samples for the template accordingly. For each mode in the MPM list, the video coder may generate a prediction for the template region and calculate a Sum of Absolute Transformed Difference (SATD) costs for the template region between the prediction and the reconstructed samples. The video coder may select the mode with the lowest cost as the mode of the TIMD. In addition, the video coder may use a plurality of angular intra modes (including a wide-angle mode) that are extended (doubled) compared to VVC, i.e., angles are arranged at twice density.
Furthermore, cao et al, "EE2-related: fusion for template-based streaming adaptation", joint video expert group (JVET) of ITU-TSG16WP3 and ISO/IECJTC1/SC29, 23 rd conference, 7 th to 16 th day by teleconferencing, 2021, document NoJVET-W0123-v2, proposed the fusion of TIMD. According to jfet-W0123, instead of selecting only one mode with the minimum SATD cost, the video coder may select the first two modes with the minimum SATD cost for the intra mode derived using the TIMD method and then fuse the two modes with weights. A video coder may use such weighted intra prediction to code the current CU. The video coder may compare the cost of the two modes selected to a threshold, applying a cost factor of 2, e.g., as follows:
costMode2<2*costMode1
if the condition is true, the video decoder may apply fusion; otherwise, the video coder may only use mode 1.
The video coder may calculate the weights for the modes from the SATD costs for the modes as follows:
weight1=costMode2/(costMode1+costMode2)
weight2=1-weight1
in addition to the DCT-II already employed in HEVC, a Multiple Transform Selection (MTS) scheme is used for residual coding of both inter-coded blocks and intra-coded blocks in VVC. The MTS scheme uses a number of transforms selected from, for example, DCT8/DST 7. The newly introduced transformation matrices are DST-7 and DCT-8. Both transformation kernels can be applied to both vertical and horizontal transforms, which correspond to 4 different combinations of horizontal transforms (trHor) and vertical transforms (trVer), as follows:
{trVer,trHor}={DST7,DST7},{DST7,DCT8},{DCT8,DST7},{DCT8,DCT8}
In jfet-O0449, a flag (cu_mts_flag) is signaled to indicate whether DCT2 is used for both trHor and trVer (cu_mts_flag=0) or not used for both trHor and trVer (cu_mts_flag=1) for a given coding unit. If not, another syntax named cu_mts_idx is signaled to indicate which transform combination is used in the four DST7/DCT8 combinations.
jfet-K0375 describes additional transform cores including DCT5, DST1, DST4, and identity transforms. 7 transform sets are defined and each has 4 different transform pairs (for { trVer, trHor }). A lookup table is defined to assign each of the 7 transform sets based on a different intra prediction mode and block size. The 7 transform sets are designed to:
T 0, intra frame ={(DST-4,DST-4),(DST-7,DST-7),(DST-4,DCT-8),(DCT-8,DST-4)}
T 1, intra frame ={(DST-7,DST-7),(DST-7,DCT-5),(DCT-5,DST-7),(DST-1,DCT-5)}
T 2, intra frame ={(DST-7,DST-7),(DST-7,DCT-8),(DCT-8,DST-7),(DCT-5,DCT-5)}
T 3, intra frame ={(DST-4,DST-4),(DST-4,DCT-5),(DCT-8,DST-4),(DST-1,DST-7)}
T 4, intra frame ={(DST-4,DST-7),(DST-7,DCT-5),(DCT-8,DST-7),(DST-1,DST-7)}
T 5, intra frame ={(DST-7,DST-7),(DST-7,DCT-5),(DCT-8,DST-7),(DST-1,DST-7)}
T 6, intra frame ={(DST-7,DST-7),(DST-7,DCT-5),(DCT-5,DST-7),(DST-1,DST-7)}
In jfet-K0375, an identity transform is applied to blocks that do not exceed 16x16 and have intra modes within a proximity of horizontal and vertical intra directions, where the proximity is defined by a threshold based on block size. If the transform index is equal to 3 and the block satisfies the above condition, a horizontal and/or vertical identity transform is applied.
In accordance with the techniques of this disclosure, video encoder 200 and video decoder 300 may be configured to select an MTS scheme based on the block size of the block and the intra prediction mode. Video encoder 200 and video decoder 300 may classify the blocks into one of sixteen different size groups based on both width and height, e.g., as shown in table 1 below, where the size groups are denoted as { WxH }, where W represents the width in the samples and H represents the height in the samples:
TABLE 1
0→{4x4} | 1→{4x8} | 2→{4x16} | 3→{4xN} |
4→{8x4} | 5→{8x8} | 6→{8x16} | 7→{8xN} |
8→{16x4} | 9→{16x8} | 10→{16x16} | 11→{16xN} |
12→{Nx4} | 13→{Nx8} | 14→{Nx16} | 15→{NxN} |
In the above, N is a power of 2 and is an integer value greater than 16 (e.g., greater than or equal to 32).
Additionally or alternatively, the video encoder 200 and the video decoder 300 may classify the prediction mode into one of a plurality of intra-prediction mode groups (e.g., five mode groups) based on the intra-prediction mode information. Table 2 below shows an example of pattern group classification:
TABLE 2
Pattern group | Intra mode Id |
0 | 0<=intramode<=1 |
1 | 2<=intramode<=12 |
2 | 13<=intramode<=23 |
3 | 24<=intramode<=34 |
4 | MIP mode |
In an example using both size groups (e.g., 16 size groups) and mode groups (e.g., 5 mode groups), a total of 16 x 5 = 80 groups may be considered. Thus, intra prediction modes and block sizes may correspond to a particular set of available MTS schemes. MTS schemes typically represent a combination of transforms, e.g., horizontal transforms and vertical transforms. All possible MTS schemes may be partitioned into a set of available MTS schemes for a particular group (e.g., size group and/or mode group) of block characteristics. Each size and/or pattern group may have four MTS scheme (transform pair) selections, which may correspond to different signaled values of the MTS index, e.g., cu_mts_idx. Thus, cu_mts_idx may have a value in {0,3} (inclusive), representing a particular MTS scheme of a set of available MTS schemes determined from the current block size and intra prediction mode. In particular, a set of available MTS schemes may be determined based on a size set including the size of a block (e.g., based on table 1) and/or a mode set including an intra-prediction mode of a current block (e.g., based on table 2).
In some examples, the number of transform pairs may depend on the block shape (e.g., whether the width is greater than the height) and/or quantization parameters of the corresponding transform block.
Further, in some examples, video encoder 200 and video decoder 300 may be configured to use joint mode and block symmetry for transform pair design. For example, pattern i (i > 34) having a block shape AxB will be mapped to the same group corresponding to (68-i) having a block shape BxA. However, for each transform pair in the set, the vertical transform and the horizontal transform will be swapped.
In other words, if the first block has the size of WxH, is predicted using the intra prediction mode i, and is transformed using the transform pair of the horizontal transform and the vertical transform, the video encoder 200 and the video decoder 300 may select the same transform pair for the second block, which has the size of HxW and is predicted using the intra prediction mode (68-i), but applies the horizontal transform as the vertical transform and the vertical transform as the horizontal transform.
For example, assume that a 16x4 block using pattern 18 (horizontal prediction) is mapped to a group, and that the signaled cu_mts_idx corresponds to the transform pair { trVer, trHor } = { DCT8, DST7}. Then, the 4x16 blocks with pattern 50 (vertical prediction) will be mapped to the same group and have the same cu_mts_idx, the transform pair will be { trVer, trHor } = { DST7, DCT8}.
For MIP coded blocks, the video encoder 200 and video decoder 300 may use the corresponding transpose flag and block shape symmetry to determine the MTS scheme. For example, video encoder 200 and video decoder 300 may map a MIP coded block having shape AxB and MIP transpose flag on to the same set as a MIP coded block having block shape BxA and MIP transpose flag off.
If the block is coded using DIMD mode, video encoder 200 and video decoder 300 may use the dominant angle mode (with the highest weight) to derive the transform pairs. Alternatively, if the difference between the two angle mode values is above a threshold, the video encoder 200 and video decoder 300 may treat the mode as a planar mode (mode 0) to determine the MTS core. Otherwise, if the difference between the two angle mode values is less than or equal to the threshold, the video encoder 200 and the video decoder 300 may use only the dominant mode to determine the MTS core.
For wide-angle intra-prediction modes, video encoder 200 and video decoder 300 may use the closest conventional angle mode to determine the transform set. For example, video encoder 200 and video decoder 300 may use mode 2 for all modes between-2 and-14. Likewise, video encoder 200 and video decoder 300 may use mode 66 for modes 67 through 80.
An example of a mapping table for deriving the MTS group from the prediction mode and block size (shape) is shown in table 3 below:
TABLE 3 Table 3
Size mode | [0,1] | [2-12] | [13-23] | [24-34] | MIP |
4x4 | 0 | 1 | 2 | 3 | 4 |
4x8 | 5 | 6 | 7 | 8 | 9 |
4x16 | 10 | 11 | 12 | 13 | 14 |
4xN | 15 | 16 | 17 | 18 | 19 |
8x4 | 20 | 21 | 22 | 23 | 24 |
8x8 | 25 | 26 | 27 | 28 | 29 |
8x16 | 30 | 31 | 32 | 33 | 34 |
8xN | 35 | 36 | 37 | 38 | 39 |
16x4 | 40 | 41 | 42 | 43 | 44 |
16x8 | 45 | 46 | 47 | 48 | 49 |
16x16 | 50 | 51 | 52 | 53 | 54 |
16xN | 55 | 56 | 57 | 58 | 59 |
32x4 | 60 | 61 | 62 | 63 | 64 |
32x8 | 65 | 66 | 67 | 68 | 69 |
32x16 | 70 | 71 | 72 | 73 | 74 |
32xN | 75 | 76 | 77 | 78 | 79 |
The following is an example mapping of transform pairs index to corresponding transform pairs (i.e., MTS scheme):
the following is an example mapping (i.e., MTS scheme) of each transform pair index to a corresponding transform pair in a set of four different transform pair indexes:
constuint8_tg_aucTrSet[80][4]={{17,18,23,24},
{3,7,18,22},
{2,17,18,22},
{3,15,17,18},
{3,12,18,19},
{12,18,19,23},
{2,12,17,18},
{2,17,18,22},
{2,11,17,18},
{12,18,19,23},
{12,13,16,24},
{2,11,16,23},
{2,13,17,22},
{2,11,17,21},
{13,16,19,22},
{7,12,13,18},
{1,11,12,16},
{3,13,17,22},
{1,6,12,22},
{12,13,15,16},
{18,19,23,24},
{2,17,18,24},
{3,4,17,22},
{12,18,19,23},
{12,18,19,23},
{6,12,18,24},
{2,6,12,21},
{1,11,17,22},
{3,11,16,17},
{8,12,19,23},
{7,13,16,23},
{1,6,11,12},
{1,11,17,21},
{6,11,17,21},
{8,11,14,17},
{6,11,12,21},
{1,6,11,12},
{2,6,11,12},
{1,6,11,21},
{7,11,12,16},
{8,12,19,24},
{1,13,18,22},
{2,6,17,21},
{11,12,16,19},
{8,12,17,24},
{6,12,19,21},
{6,12,13,21},
{2,16,17,21},
{6,17,19,23},
{6,12,14,17},
{6,7,11,21},
{1,11,12,16},
{1,6,11,12},
{6,11,12,21},
{7,8,9,11},
{6,7,11,12},
{6,7,11,12},
{1,11,12,16},
{6,11,17,21},
{6,7,11,12},
{12,14,18,21},
{1,11,16,22},
{1,11,16,22},
{7,13,15,16},
{1,8,12,19},
{6,7,9,12},
{2,6,12,13},
{1,12,16,21},
{7,11,16,19},
{7,8,11,12},
{6,7,11,12},
{6,7,11,12},
{1,6,11,12},
{6,7,11,16},
{6,7,11,12},
{6,7,11,12},
{6,11,12,21},
{1,6,11,12},
{6,7,11,12},
{6,7,11,12},};
in the above example, the g_auctridxtototr data structure represents a cluster of 25 possible MTS schemes (transform pairs). These MTS schemes are associated with corresponding index values from 1 to 25. The g_auctrset data structure represents a cluster of 80 different sets of MTS schemes. Specifically, the values in each MTS scheme set correspond to the indices in the g_auctridxtototr data structure. The size of the block (e.g., a size group) and the intra-prediction mode of the block (e.g., a mode group) may be jointly mapped to one of the entries of the g_auctridxtotr data structure. The video encoder 200 and video decoder 300 may also decode a transform index representing an index value (0, 1, 2, or 3) into the set of available MTS schemes, i.e., one of the entries of the g_auctridxtotr data structure to which the block size and intra prediction mode are mapped. The video decoder 300 may use the decoded index value to determine one of the indices in the set of entries of the g_auctridxtotr data structure and then use one of the indices from the set of entries in the g_auctridxtotr data structure to determine the corresponding MTS scheme, e.g., using the g_auctrset data structure.
For example, if the size of the current block is 4x4 and the intra prediction mode is mode 0 or mode 1, the size of the current block and the intra prediction mode are mapped (according to table 3) to the first entry of the g_auctridxtodr data structure (i.e., {17, 18, 23, 24 }). If the decoded transform index has a value of 0, the video decoder 300 may determine one of the indices to be 17. Using the g_auctrset data structure, the video decoder 300 can then determine that the MTS scheme is the 17 th transform pair, i.e., { DST4, DST7}. As another example, if the size of the current block is 4x16 and the intra prediction mode of the current block is mode 0 or mode 1, the size of the current block and the intra prediction mode are mapped to the tenth entry of the g_auctridxtotr data structure (i.e., {12, 18, 19, 23 }) according to table 3. If the decoded transform index has a value of 3, the video decoder 300 may determine one of the indices to be 23. Using the g_auctrset data structure, the video decoder 300 can determine that the MTS scheme is the 23 rd transform pair, i.e., { DST1, DCT5}.
As described above, in some examples, when TIMD is activated, video encoder 200 and video decoder 300 may use an extended (e.g., doubled) number of intra modes. That is, the angles of intra modes may be arranged in a twice as dense manner. Various techniques for deriving transform kernels are described below.
In one example, when the TIMD mode includes one intra mode for intra prediction (i.e., no fusion), the video encoder 200 or video decoder 300 may map the intra mode to the VVC intra mode having the closest angle (selected from one of the 67+ wide angle modes of VVC). The mapped modes may then be used by the video encoder 200 or the video decoder 300 to determine the MTS core. If the VVC intra mode is a subset of the extended intra mode (i.e., each alternate intra mode in the extended set corresponds to a VVC inter mode), then the transition may be as follows (mode 0 and mode 1 are non-angular modes, so the transition does not affect the values of these modes):
mode=(mode<2mode:((mode>>1)+1))
when TIMD mode uses fusion (generating final intra prediction involves two modes), the video encoder 200 or video decoder 300 may map only the dominant mode (with lower distortion) to VVC intra mode to determine the MTS kernel.
According to a conventional Error Concealment Mode (ECM), a video coder, such as video encoder 200 or video decoder 300, will employ an intra-mode based low frequency inseparable transform (LFNST). In accordance with the techniques of this disclosure, video encoder 200 or video decoder 300 may also be configured to apply the techniques described above to the LFNST transform core.
In another example, when TIMD mode is used, a look-up table (LUT) or mapping table for mapping intra modes to transform cores may be specified. The table may be designated for expansion (doubling) angles, and the video encoder 200 and the video decoder 300 may be configured with the table.
When video encoder 200 or video decoder 300 applies TIMD with fusion (i.e., generating final intra prediction involves two modes), video encoder 200 and video decoder 300 may use only the dominant mode to determine the MTS kernel.
Alternatively, when the difference between the two mode values is above the threshold, the video encoder 200 and video decoder 300 may treat the mode as a planar mode (mode 0) to determine the MTS core. Otherwise, if the difference is less than or equal to the threshold, the video encoder 200 and the video decoder 300 may determine the MTS core using only the dominant mode.
In another example, when the TIMD mode is used, the video encoder 200 and video decoder 300 may disable MTS, i.e., only DCT2 may be used for TIMD. In this case, video encoder 200 may refrain from signaling mts_idx and video decoder 300 may determine that mts_idx is not signaled and instead infer the value of mts_idx. Such disabling may also depend on block sizes, e.g., MTS may be disabled for certain block sizes. Similarly, LFNST may also be disabled, optionally in combination with block size limitations, when TIMD decoding is used.
Fig. 7 is a block diagram illustrating an example video encoder 200 that may perform the techniques of this disclosure. Fig. 7 is provided for purposes of explanation and should not be considered as limiting the techniques broadly illustrated and described in this disclosure. For purposes of explanation, this disclosure describes video encoder 200 in accordance with VVC (ITU-t h.266, being developed) and HEVC (ITU-t h.265) technologies. However, the techniques of this disclosure may be performed by video encoding devices configured in other video coding standards and video coding formats (such as AV1 and subsequent versions of the AV1 video coding format).
In the example of fig. 7, video encoder 200 includes video data memory 230, mode selection unit 202, residual generation unit 204, transform processing unit 206, quantization unit 208, inverse quantization unit 210, inverse transform processing unit 212, reconstruction unit 214, filter unit 216, decoded Picture Buffer (DPB) 218, multiple Transform Selection (MTS) group 232, and entropy encoding unit 220. Any or all of video data memory 230, mode selection unit 202, residual generation unit 204, transform processing unit 206, quantization unit 208, inverse quantization unit 210, inverse transform processing unit 212, reconstruction unit 214, filter unit 216, DPB218, MTS group 232, and entropy encoding unit 220 may be implemented in one or more processors or in processing circuitry. For example, the elements of video encoder 200 may be implemented as one or more circuits or logic elements as part of a hardware circuit or as part of a processor, ASIC, or FPGA. Furthermore, video encoder 200 may include additional or alternative processors or processing circuits to perform these and other functions.
Video data memory 230 may store video data to be encoded by components of video encoder 200. Video encoder 200 may receive video data stored in video data store 230 from, for example, video source 104 (fig. 1). DPB218 may serve as a reference picture memory that stores reference video data for use in predicting subsequent video data by video encoder 200. Video data memory 230 and DPB218 may be formed from any of a variety of memory devices, such as Dynamic Random Access Memory (DRAM) (including Synchronous DRAM (SDRAM)), magnetoresistive RAM (MRAM), resistive RAM (RRAM), or other types of memory devices. Video data memory 230 and DPB218 may be provided by the same memory device or separate memory devices. In various examples, video data memory 230 may be on-chip (as shown) with other components of video encoder 200, or off-chip with respect to those components.
In this disclosure, references to video data memory 230 should not be construed as limited to memory internal to video encoder 200 (unless specifically described as such) or memory external to video encoder 200 (unless specifically described as such). In particular, references to video data memory 230 should be understood as reference memory storing video data received by video encoder 200 for encoding (e.g., video data for a current block to be encoded). The memory 106 of fig. 1 may also provide temporary storage of the output from the various units of the video encoder 200.
The various elements of fig. 7 are shown to aid in understanding the operations performed by video encoder 200. The units may be implemented as fixed function circuits, programmable circuits or a combination thereof. The fixed function circuit refers to a circuit that provides a specific function, and an operation that can be performed is set in advance. Programmable circuitry refers to circuitry that can be programmed to perform various tasks and provide flexible functionality in the operations that can be performed. For example, the programmable circuit may execute software or firmware that causes the programmable circuit to operate in a manner defined by instructions of the software or firmware. Fixed function circuitry may execute software instructions (e.g., receive parameters or output parameters) but the type of operation that fixed function circuitry performs is typically not variable. In some examples, one or more of the units may be different circuit blocks (fixed function or programmable), and in some examples, one or more of the units may be an integrated circuit.
The video encoder 200 may include an Arithmetic Logic Unit (ALU), a basic functional unit (EFU), digital circuitry, analog circuitry, and/or a programmable core formed from programmable circuitry. In examples where the operations of video encoder 200 are performed using software executed by programmable circuitry, memory 106 (fig. 1) may store instructions (e.g., object code) of the software received and executed by video encoder 200, or another memory within video encoder 200 (not shown) may store such instructions.
The video data memory 230 is configured to store received video data. The video encoder 200 may retrieve pictures of the video data from the video data memory 230 and provide the video data to the residual generation unit 204 and the mode selection unit 202. The video data in the video data memory 230 may be raw video data to be encoded.
The mode selection unit 202 comprises a motion estimation unit 222, a motion compensation unit 224 and an intra prediction unit 226. The mode selection unit 202 may include additional functional units that perform video prediction according to other prediction modes. As an example, mode selection unit 202 may include a palette unit, an intra-block copy unit (which may be part of motion estimation unit 222 and/or motion compensation unit 224), an affine unit, a Linear Model (LM) unit, and the like.
The mode selection unit 202 typically coordinates multiple encoding passes (pass) to test combinations of encoding parameters and resulting rate distortion values for such combinations. The coding parameters may include a division of CTUs into CUs, a prediction mode for the CU, a transform type for residual data of the CU, quantization parameters for residual data of the CU, and the like. The mode selection unit 202 may finally select a combination of coding parameters having better rate-distortion values than other tested combinations.
Video encoder 200 may segment the pictures retrieved from video data store 230 into a series of CTUs and encapsulate one or more CTUs within a slice. The mode selection unit 202 may divide CTUs of pictures according to a tree structure such as the MTT structure, QTBT structure, superblock structure, or quadtree structure described above. As described above, the video encoder 200 may form one or more CUs by dividing CTUs according to a tree structure. Such CUs may also be commonly referred to as "video blocks" or "blocks.
Typically, mode selection unit 202 also controls its components (e.g., motion estimation unit 222, motion compensation unit 224, and intra prediction unit 226) to generate a prediction block for the current block (e.g., the current CU, or in HEVC, the overlapping portion of PU and TU). To inter-predict the current block, motion estimation unit 222 may perform a motion search to identify one or more closely matching reference blocks in one or more reference pictures (e.g., one or more previously coded pictures stored in DPB 218). In particular, the motion estimation unit 222 may calculate a value indicating how similar the potential reference block will be to the current block, for example, from the Sum of Absolute Differences (SAD), sum of Squared Differences (SSD), mean Absolute Difference (MAD), mean Squared Difference (MSD), etc. The motion estimation unit 222 may typically perform these calculations using sample-by-sample differences between the current block and the reference block under consideration. The motion estimation unit 222 may identify the reference block with the lowest value resulting from these calculations, which indicates the reference block that most closely matches the current block.
The motion estimation unit 222 may form one or more Motion Vectors (MVs) defining a position of the reference block in the reference picture relative to a position of the current block in the current picture. The motion estimation unit 222 may then provide the motion vectors to the motion compensation unit 224. For example, for unidirectional inter prediction, the motion estimation unit 222 may provide a single motion vector, while for bidirectional inter prediction, the motion estimation unit 222 may provide two motion vectors. The motion compensation unit 224 may then generate a prediction block using the motion vector. For example, the motion compensation unit 224 may use the motion vector to retrieve the data of the reference block. As another example, if the motion vector has fractional sample precision, the motion compensation unit 224 may interpolate values for the prediction block according to one or more interpolation filters. Furthermore, for bi-directional inter prediction, the motion compensation unit 224 may retrieve data for the two reference blocks identified by the respective motion vectors and combine the retrieved data, e.g. by sample-wise averaging or weighted averaging.
When operating in accordance with the AV1 video coding format, motion estimation unit 222 and motion compensation unit 224 may be configured to encode coding blocks (e.g., both luma coding blocks and chroma coding blocks) of video data using translational motion compensation, affine motion compensation, overlapped Block Motion Compensation (OBMC), and/or composite intra prediction.
As another example, for intra prediction or intra prediction coding, the intra prediction unit 226 may generate a prediction block from samples adjacent to the current block. For example, for directional modes, intra-prediction unit 226 may typically mathematically combine the values of adjacent samples and populate these calculated values in defined directions across the current block to produce a prediction block. As another example, for the DC mode, the intra prediction unit 226 may calculate an average value of neighboring samples of the current block and generate the prediction block to include the resulting average value for each sample of the prediction block.
When operating in accordance with the AV1 video coding format, the intra prediction unit 226 may be configured to encode coded blocks of video data (e.g., both luma coded blocks and chroma coded blocks) using directional intra prediction, non-directional intra prediction, recursive filter intra prediction, chroma (CFL) prediction, intra copy (IBC), and/or palette modes. The mode selection unit 202 may include additional functional units that perform video prediction according to other prediction modes.
The mode selection unit 202 supplies the prediction block to the residual generation unit 204. The residual generation unit 204 receives the original, non-coded version of the current block from the video data store 230 and the prediction block from the mode selection unit 202. The residual generation unit 204 calculates a sample-by-sample difference between the current block and the prediction block. The resulting sample-by-sample difference defines the residual block of the current block. In some examples, residual generation unit 204 may also determine differences between sample values in the residual block to generate the residual block using Residual Differential Pulse Code Modulation (RDPCM). In some examples, residual generation unit 204 may be formed using one or more subtractor circuits that perform binary subtraction.
In examples in which mode selection unit 202 divides a CU into PUs, each PU may be associated with a luma prediction unit and a corresponding chroma prediction unit. Video encoder 200 and video decoder 300 may support PUs having various sizes. As noted above, the size of a CU may refer to the size of the luma coding block of the CU, while the size of a PU may refer to the size of the luma prediction unit of the PU. Assuming that the size of a particular CU is 2Nx2N, video encoder 200 may support PU sizes of 2Nx2N or NxN for intra prediction, and 2Nx2N, 2NxN, nx2N, nxN, or similar symmetric PU sizes for inter prediction. The video encoder 200 and the video decoder 300 may also support asymmetric partitioning for PU sizes of 2NxnU, 2NxnD, nLx2N, and nRx2N for inter prediction.
In examples in which mode selection unit 202 does not further divide the CUs into PUs, each CU may be associated with a luma coding block and a corresponding chroma coding block. As above, the size of a CU may refer to the size of the luma coding block of the CU. The video encoder 200 and the video decoder 300 may support CU sizes of 2Nx2N, 2NxN, or Nx 2N.
For other video coding techniques, such as intra-block copy mode coding, affine mode coding, and Linear Model (LM) mode coding, to name a few examples, mode selection unit 202 generates a prediction block for the current block being encoded via a respective unit associated with the coding technique. In some examples (such as palette mode coding), mode selection unit 202 may not generate a prediction block, but instead generate a syntax element indicating the manner in which it reconstructs the block based on the selected palette. In such a mode, the mode selection unit 202 may provide these syntax elements to the entropy encoding unit 220 for encoding.
As described above, the residual generation unit 204 receives video data for the current block and the corresponding prediction block. Then, the residual generating unit 204 generates a residual block for the current block. In order to generate the residual block, the residual generation unit 204 calculates a sample-by-sample difference between the prediction block and the current block.
The transform processing unit 206 applies one or more transforms to the residual block to generate a block of transform coefficients (referred to herein as a "block of transform coefficients"). The transform processing unit 206 may apply various transforms to the residual block to form a block of transform coefficients. For example, transform processing unit 206 may apply a Discrete Cosine Transform (DCT), a direction transform, a Karhunen-Loeve transform (KLT), or a conceptually similar transform to the residual block. In some examples, transform processing unit 206 may perform a variety of transforms on the residual block, e.g., a primary transform and a secondary transform (such as a rotation transform). In some examples, transform processing unit 206 does not apply a transform to the residual block.
In accordance with the techniques of this disclosure, transform processing unit 206 may receive data representing a size and a prediction mode (e.g., intra-prediction mode) of a current block of video data. The transform processing unit 206 may determine the MTS group from the MTS groups 232 based on the size of the current block and the prediction mode. For example, the transform processing unit 206 may determine a size group including the size of the current block, for example, according to table 1 as described above. As another example, additionally or alternatively, transform processing unit 206 may determine a mode group including an intra prediction mode of the current block, e.g., according to table 2 above. The transform processing unit 206 may then select the MTS group to which the size and intra-prediction mode (e.g., size group and/or mode group) are mapped from the MTS group 232, e.g., as discussed above with respect to table 3. Also, in some examples, transform processing unit 206 may utilize symmetry of block sizes and/or intra prediction modes, where MxN sized blocks may be mapped to the same MTS group as NxM sized blocks, e.g., as described above.
The transformation processing unit 206 may evaluate each MTS scheme in the determined MTS group. The transform processing unit 206 may select one of the MTS schemes from the group that produces the lowest energy transform block (e.g., the transform block with the most zero-valued coefficients or the lowest average coefficient value). The transform processing unit 206 may then send the index values to the entropy encoding unit 220 to be encoded as transform indexes, where the transform indexes identify the determined MTS schemes in the MTS group. The transform processing unit 206 may also provide the index value to the inverse transform processing unit 212.
When operating according to AV1, the transform processing unit 206 may apply one or more transforms to the residual block to generate a block of transform coefficients (referred to herein as a "block of transform coefficients"). The transform processing unit 206 may apply various transforms to the residual block to form a block of transform coefficients. For example, transform processing unit 206 may apply a horizontal/vertical transform combination, which may include a Discrete Cosine Transform (DCT), an Asymmetric Discrete Sine Transform (ADST), a flipped ADST (e.g., ADST in reverse order), and an identity transform (IDTX). When using an identity transform, the transform is skipped in one of the vertical or horizontal directions. In some examples, the transformation process may be skipped.
The quantization unit 208 may quantize the transform coefficients in the block of transform coefficients to generate a block of quantized transform coefficients. The quantization unit 208 may quantize the transform coefficients of the block of transform coefficients according to the QP value associated with the current block. The video encoder 200 (e.g., via the mode selection unit 202) may adjust the degree of quantization applied to the transform coefficient block associated with the current block by adjusting the QP value associated with the CU. Quantization may cause information loss and, therefore, the quantized transform coefficients may have lower precision than the original transform coefficients generated by the transform processing unit 206.
The inverse quantization unit 210 and the inverse transform processing unit 212 may apply inverse quantization and inverse transform, respectively, to the quantized transform coefficient block to reconstruct a residual block from the transform coefficient block. The reconstruction unit 214 may generate a reconstructed block corresponding to the current block (although potentially with some degree of distortion) based on the reconstructed residual block and the prediction block generated by the mode selection unit 202. For example, the reconstruction unit 214 may add samples of the reconstructed residual block with corresponding samples from the prediction block generated by the mode selection unit 202 to generate a reconstructed block.
In accordance with the techniques of this disclosure, the inverse transform processing unit 212 may receive data representing the size and prediction mode (e.g., intra prediction mode) of a current block of video data. The inverse transform processing unit 212 may determine an MTS group from the MTS groups 232 according to the size of the current block and the prediction mode. For example, the inverse transform processing unit 212 may determine a size group including the size of the current block, for example, according to table 1 as described above. As another example, additionally or alternatively, the inverse transform processing unit 212 may determine a mode group including an intra prediction mode of the current block, for example, according to table 2 above. The inverse transform processing unit 212 may also receive a transform index from the transform processing unit 206. Using the transform index, the inverse transform processing unit 212 may determine the MTS scheme in the MTS group from the MTS group 232 to which the size and intra-prediction modes (e.g., size group and/or mode group) are mapped, e.g., as discussed above with respect to table 3. Also, in some examples, inverse transform processing unit 212 may utilize symmetry of block sizes and/or intra prediction modes, where MxN sized blocks may be mapped to the same MTS group as NxM sized blocks, e.g., as described above. The inverse transform processing unit 212 may inverse transform the transform block using the determined MTS scheme.
The filter unit 216 may perform one or more filter operations on the reconstructed block. For example, the filter unit 216 may perform deblocking operations to reduce blocking artifacts along edges of CUs. In some examples, the operation of the filter unit 216 may be skipped.
When operating in accordance with AV1, the filter unit 216 may perform one or more filter operations on the reconstructed block. For example, the filter unit 216 may perform deblocking operations to reduce blocking artifacts along edges of CUs. In other examples, filter unit 216 may apply a Constrained Directional Enhancement Filter (CDEF), which may be applied after deblocking, and may include applying a non-separable, non-linear, low pass direction filter based on the estimated edge direction. The filter unit 216 may also include a loop recovery filter applied after CDEF and may include a separable symmetric normalized wiener filter or a double self-guiding filter.
Video encoder 200 stores the reconstructed block in DPB 218. For example, in an example in which the operation of filter unit 216 is not performed, reconstruction unit 214 may store the reconstructed block into DPB 218. In an example in which the operation of filter unit 216 is performed, filter unit 216 may store the filtered reconstructed block into DPB 218. Motion estimation unit 222 and motion compensation unit 224 may retrieve a reference picture formed from the reconstructed (and potentially filtered) block from DPB218 to inter-predict a block of a subsequently encoded picture. In addition, intra-prediction unit 226 may use the reconstructed block of the current picture in DPB218 to intra-predict other blocks in the current picture.
In general, entropy encoding unit 220 may entropy encode syntax elements received from other functional components of video encoder 200. For example, entropy encoding unit 220 may entropy encode the quantized transform coefficient block from quantization unit 208. As another example, the entropy encoding unit 220 may entropy encode a prediction syntax element (e.g., motion information for inter prediction or intra mode information for intra prediction) from the mode selection unit 202. The entropy encoding unit 220 may perform one or more entropy encoding operations on syntax elements that are another example of video data to generate entropy encoded data. For example, the entropy encoding unit 220 may perform a Context Adaptive Variable Length Coding (CAVLC) operation, a CABAC operation, a variable-variable (V2V) length coding operation, a syntax-based context adaptive binary arithmetic coding (SBAC) operation, a Probability Interval Partitioning Entropy (PIPE) coding operation, an exponential golomb coding operation, or another type of entropy encoding operation on the data. In some examples, entropy encoding unit 220 may operate in a bypass mode in which syntax elements are not entropy encoded.
The video encoder 200 may output a bitstream including entropy encoded syntax elements required to reconstruct blocks of slices or pictures. In particular, the entropy encoding unit 220 may output a bitstream.
According to AV1, the entropy encoding unit 220 may be configured as a symbol-to-symbol adaptive multi-symbol arithmetic decoder. The syntax element in AV1 includes an alphabet of N elements, and the context (e.g., probability model) includes a set of N probabilities. The entropy encoding unit 220 may store the probabilities as an n-bit (e.g., 15-bit) Cumulative Distribution Function (CDF). Entropy encoding unit 22 may perform recursive scaling with an update factor based on the letter size to update the context.
The operations described above are described with respect to blocks. Such descriptions should be understood as operations for luma coding blocks and/or chroma coding blocks. As described above, in some examples, the luma and chroma coding blocks are the luma and chroma components of a CU. In some examples, the luma and chroma coding blocks are luma and chroma components of the PU.
In some examples, the operations performed with respect to luma coded blocks need not be repeated for chroma coded blocks. As one example, operations of identifying Motion Vectors (MVs) of luma coded blocks and reference pictures need not be repeated to identify MVs and reference pictures of chroma blocks. Conversely, MVs for luma coding blocks may be scaled to determine MVs for chroma blocks, and reference pictures may be the same. As another example, the intra prediction process may be the same for both luma and chroma coded blocks.
The video encoder 200 may be configured to apply any of the techniques of this disclosure to determine and signal the MTS scheme of the blocks of video data.
Fig. 8 is a block diagram illustrating an example video decoder 300 that may perform the techniques of this disclosure. Fig. 8 is provided for purposes of explanation and is not a limitation on the techniques broadly illustrated and described in this disclosure. For purposes of explanation, the present disclosure describes video decoder 300 in terms of techniques of VVC (ITU-t h.266, under development) and HEVC (ITU-t h.265). However, the techniques of this disclosure may be performed by video coding devices configured for other video coding standards.
In the example of fig. 8, video decoder 300 includes Coded Picture Buffer (CPB) memory 320, entropy decoding unit 302, prediction processing unit 304, inverse quantization unit 306, inverse transform processing unit 308, reconstruction unit 310, filter unit 312, MTS group 322, and Decoded Picture Buffer (DPB) 134. Any or all of CPB memory 320, entropy decoding unit 302, prediction processing unit 304, inverse quantization unit 306, inverse transform processing unit 308, reconstruction unit 310, filter unit 312, MTS group 322, and DPB134 may be implemented in one or more processors or in processing circuitry. For example, the elements of video decoder 300 may be implemented as one or more circuits or logic elements as part of a hardware circuit, or as part of a processor, ASIC, or FPGA. Furthermore, the video decoder 300 may include additional or alternative processors or processing circuits to perform these and other functions.
The prediction processing unit 304 includes a motion compensation unit 316 and an intra prediction unit 318. The prediction processing unit 304 may include an addition unit that performs prediction according to other prediction modes. As an example, the prediction processing unit 304 may include a palette unit, an intra-block copy unit (which may form part of the motion compensation unit 316), an affine unit, a Linear Model (LM) unit, and the like. In other examples, video decoder 300 may include more, fewer, or different functional components.
When operating in accordance with AV1, the compensation unit 316 may be configured to decode coded blocks of video data (e.g., both luma coded blocks and chroma coded blocks) using translational motion compensation, affine motion compensation, OBMC, and/or composite intra inter prediction, as described above. Intra-prediction unit 318 may be configured to decode coded blocks of video data (e.g., both luma coded blocks and chroma coded blocks) using directional intra-prediction, non-directional intra-prediction, recursive filter intra-prediction, CFL prediction, intra-block copy (IBC), and/or palette modes, as described above.
The CPB memory 320 may store video data, such as an encoded video bitstream, to be decoded by components of the video decoder 300. For example, video data stored in the CPB memory 320 may be obtained from the computer-readable medium 110 (fig. 1). The CPB memory 320 may include CPBs that store encoded video data (e.g., syntax elements) from an encoded video bitstream. Further, the CPB memory 320 may store video data other than syntax elements of the coded pictures, such as temporary data representing outputs from various units of the video decoder 300. DPB314 typically stores decoded pictures, which video decoder 300 may output and/or use as reference video data when decoding subsequent data or pictures of an encoded video bitstream. CPB memory 320 and DPB314 may be formed from any of a variety of memory devices, such as Dynamic Random Access Memory (DRAM) (including Synchronous DRAM (SDRAM)), magnetoresistive RAM (MRAM), resistive RAM (RRAM), or other types of memory devices. CPB memory 320 and DPB314 may be provided by the same memory device or separate memory devices. In various examples, CPB memory 320 may be on-chip with other components of video decoder 300, or off-chip with respect to those components.
Additionally or alternatively, in some examples, video decoder 300 may retrieve decoded video data from memory 120 (fig. 1). That is, memory 120 may utilize CPB memory 320 to store data as discussed above. Also, when some or all of the functions of the video decoder 300 are implemented in software to be executed by the processing circuitry of the video decoder 300, the memory 120 may store instructions to be executed by the video decoder 300.
The various units shown in fig. 8 are shown to aid in understanding the operations performed by the video decoder 300. The units may be implemented as fixed function circuits, programmable circuits or a combination thereof. Similar to fig. 7, the fixed function circuit refers to a circuit that provides a specific function and is preset with respect to operations that can be performed. Programmable circuitry refers to circuitry that can be programmed to perform various tasks and provide flexible functionality in the operations that can be performed. For example, the programmable circuit may execute software or firmware that causes the programmable circuit to operate in a manner defined by instructions of the software or firmware. Fixed function circuitry may execute software instructions (e.g., receive parameters or output parameters) but the type of operation that fixed function circuitry performs is typically not variable. In some examples, one or more of the units may be different circuit blocks (fixed function or programmable), and in some examples, one or more of the units may be an integrated circuit.
The video decoder 300 may include an ALU, an EFU, a digital circuit, an analog circuit, and/or a programmable core formed of programmable circuitry. In examples where the operations of video decoder 300 are performed by software executing on programmable circuits, on-chip or off-chip memory may store instructions (e.g., object code) of the software received and executed by video decoder 300.
The entropy decoding unit 302 may receive encoded video data from the CPB and entropy decode the video data to reproduce the syntax element. The prediction processing unit 304, the inverse quantization unit 306, the inverse transform processing unit 308, the reconstruction unit 310, and the filter unit 312 may generate decoded video data based on syntax elements extracted from the bitstream.
Typically, the video decoder 300 reconstructs the pictures block by block. The video decoder 300 may perform a reconstruction operation on each block separately (where the block currently being reconstructed (i.e., decoded) may be referred to as a "current block").
The entropy decoding unit 302 may entropy decode syntax elements defining quantized transform coefficients of the quantized transform coefficient block and transform information, such as Quantization Parameters (QP) and/or transform mode indications. The inverse quantization unit 306 may determine a quantization degree using a QP associated with the quantized transform coefficient block and, as such, an inverse quantization degree for the inverse quantization unit 306 to apply. The inverse quantization unit 306 may, for example, perform a bitwise left shift operation to inversely quantize the quantized transform coefficients. The inverse quantization unit 306 may thus form a transform coefficient block including the transform coefficients.
After the inverse quantization unit 306 forms the transform coefficient block, the inverse transform processing unit 308 may apply one or more inverse transforms to the transform coefficient block to generate a residual block associated with the current block. For example, the inverse transform processing unit 308 may apply an inverse DCT, an inverse integer transform, an inverse Karhunen-Loeve transform (KLT), an inverse rotation transform, an inverse direction transform, or another inverse transform to the transform coefficient block.
In accordance with the techniques of this disclosure, inverse transform processing unit 308 may receive data representing a size and a prediction mode (e.g., an intra-prediction mode) of a current block of video data from entropy decoding unit 302 and/or prediction processing unit 304. The inverse transform processing unit 308 may determine the MTS group from the MTS group 322 according to the size of the current block and the prediction mode. For example, the inverse transform processing unit 308 may determine a size group including the size of the current block, for example, according to table 1 as described above. As another example, additionally or alternatively, the inverse transform processing unit 308 may determine a mode group including an intra prediction mode of the current block, for example, according to table 2 above. The inverse transform processing unit 308 may also receive a transform index from the entropy decoding unit 302. Using the transform index, the inverse transform processing unit 308 may determine the MTS scheme in the MTS group from the MTS group 322 to which the size and intra-prediction modes (e.g., size group and/or mode group) are mapped, e.g., as discussed above with respect to table 3. Also, in some examples, inverse transform processing unit 308 may utilize symmetry of block sizes and/or intra prediction modes, where MxN sized blocks may be mapped to the same MTS group as NxM sized blocks, e.g., as described above. The inverse transform processing unit 308 may inverse transform the transform block using the determined MTS scheme.
Further, the prediction processing unit 304 generates a prediction block from the prediction information syntax element entropy-decoded by the entropy decoding unit 302. For example, if the prediction information syntax element indicates that the current block is inter predicted, the motion compensation unit 316 may generate the prediction block. In this case, the prediction information syntax element may indicate a reference picture in DPB314 from which to retrieve the reference block, and a motion vector identifying a position of the reference block in the reference picture relative to a position of the current block in the current picture. Motion compensation unit 316 may generally perform the inter-prediction process in a substantially similar manner as described with respect to motion compensation unit 224 (fig. 7).
As another example, if the prediction information syntax element indicates that the current block is intra-predicted, the intra-prediction unit 318 may generate the prediction block according to the intra-prediction mode indicated by the prediction information syntax element. Again, intra-prediction unit 318 may generally perform an intra-prediction process in a substantially similar manner as described with respect to intra-prediction unit 226 (fig. 7). Intra-prediction unit 318 may retrieve data for neighboring samples of the current block from DPB 314.
The reconstruction unit 310 may reconstruct the current block using the prediction block and the residual block. For example, the reconstruction unit 310 may reconstruct the current block by adding samples of the residual block and corresponding samples of the prediction block.
The filter unit 312 may perform one or more filter operations on the reconstructed block. For example, the filter unit 312 may perform a deblocking operation to reduce blocking artifacts along edges of the reconstructed block. The operation of the filter unit 312 is not necessarily performed in all examples.
Video decoder 300 may store the reconstructed block in DPB 314. For example, in an example in which the operation of filter unit 312 is not performed, reconstruction unit 310 may store the reconstructed block into DPB 314. In an example in which the operations of filter unit 312 are performed, filter unit 312 may store the filtered reconstructed block into DPB 314. As discussed above, DPB314 may provide reference information (such as samples of the current picture for intra prediction and previously decoded pictures for subsequent motion compensation) to prediction processing unit 304. Further, video decoder 300 may output decoded pictures (e.g., decoded video) from DPB314 for subsequent presentation on a display device, such as display device 118 of fig. 1.
In this manner, video decoder 300 represents an example of an apparatus for decoding video data, the apparatus comprising: a memory configured to store video data; and one or more processors implemented in the circuitry and configured to: determining a size of a current block of video data; determining an intra prediction mode for a current block of video data; determining a mode group including the determined intra prediction mode, the mode group being one of a plurality of mode groups, each of the mode groups of the plurality of mode groups including a respective set of intra prediction modes, such that each possible intra prediction mode is not included in more than one of the mode groups; determining an available Multiple Transform Selection (MTS) scheme set for the current block according to a size of the current block and an intra-prediction mode, the available MTS scheme set being one of a plurality of MTS scheme sets; determining an MTS scheme from a set of available MTS schemes based on the determined pattern groups; applying a transform of the MTS scheme to a transform block of the current block to generate a residual block of the current block; and decoding the current block using the residual block.
Fig. 9 is a flowchart illustrating an example method for encoding a current block in accordance with the techniques of this disclosure. The current block may include the current CU. Although described with respect to video encoder 200 (fig. 1 and 7), it should be understood that other devices may be configured to perform a method similar to the method of fig. 9.
In this example, video encoder 200 initially predicts the current block (350). For example, the video encoder 200 may form a prediction block of the current block using an intra prediction mode. The video encoder 200 may then calculate a residual block for the current block (352). To calculate the residual block, the video encoder 200 may calculate a difference between the original non-coded block and the prediction block for the current block. The video encoder 200 may then transform the residual block and quantize the transform coefficients of the residual block (354). In particular, the video encoder 200 may determine the MTS scheme to apply to the residual block according to any of the various techniques of this disclosure (e.g., according to the block size and intra prediction mode of the block). Next, video encoder 200 may scan quantized transform coefficients of the residual block (356). During or after scanning, video encoder 200 may entropy encode the transform coefficients (358). For example, the video encoder 200 may encode the transform coefficients using CAVLC or CABAC. The video encoder 200 may then output entropy encoded data of the block (360).
The video encoder 200 may also decode the current block after encoding the current block to use the decoded version of the current block as reference data for subsequently coded data (e.g., in inter or intra prediction mode). Accordingly, the video encoder 200 may inverse quantize and inverse transform the coefficients to reproduce the residual block (362). The video encoder 200 may combine the residual block with the prediction block to form a decoded block (364). Video encoder 200 may then store the decoded blocks in DPB218 (366).
Fig. 10 is a flowchart illustrating an example method for decoding a current block of video data in accordance with the techniques of this disclosure. The current block may include the current CU. Although described with respect to video decoder 300 (fig. 1 and 8), it should be understood that other devices may be configured to perform a method similar to the method of fig. 10.
The video decoder 300 may receive entropy encoded data for the current block, such as entropy encoded prediction information and entropy encoded data for transform coefficients of a residual block corresponding to the current block (370). The video decoder 300 may entropy decode the entropy encoded data to determine prediction information for the current block and reproduce transform coefficients of the residual block (372). The video decoder 300 may predict the current block, for example, using an intra prediction mode indicated by prediction information for the current block (374), to calculate a predicted block for the current block. The video decoder 300 may then inverse scan the reproduced transform coefficients (376) to create blocks of quantized transform coefficients. The video decoder 300 may use the intra prediction mode and the block size to determine the MTS scheme for the block according to any of the various techniques of this disclosure. The video decoder 300 may then inverse quantize the transform coefficients using a MTS scheme and apply an inverse transform to the transform coefficients to generate a residual block (378). The video decoder 300 may finally decode the current block by combining the prediction block and the residual block (380).
Fig. 11 is a flowchart illustrating an example method of decoding a block of video data in accordance with the techniques of this disclosure. For purposes of example, the method of fig. 11 may be performed by the video decoder 300 (fig. 1 and 8), and explained with respect to the video decoder 300. The video encoder 200 in fig. 1 and 7, as well as other video coding (encoding and/or decoding devices), may be configured to perform this or a similar method. The method of fig. 11 may be performed as part of the method of fig. 9 (e.g., steps 354 and/or 362) or as part of the method of fig. 10 (e.g., step 378).
Initially, video decoder 300 determines the size of the current block of video data (400). For example, the video decoder 300 may determine the width W and the height H of the current block, where W and H represent the number of samples along the corresponding dimension of the current block. The video decoder 300 may also determine a current intra prediction mode for the current block (402). For example, the video decoder 300 may decode one or more intra prediction mode syntax elements that represent an intra prediction mode of the current block. Alternatively, video decoder 300 may determine the intra-prediction mode using any of the various techniques discussed above with respect to fig. 3-6.
The video decoder 300 may then determine a mode group comprising the determined intra-prediction modes (404). For example, the video decoder 300 may determine the mode group according to table 2 above. In other examples, other groupings of patterns may be used, which may include more or fewer groups and/or more or fewer patterns in each group.
The video decoder 300 may then determine a set of available MTS schemes for the current block based on the mode group and the size of the current block (406). For example, the video decoder 300 may determine the set of available MTS schemes from table 3 above. That is, the set of available MTS schemes may be one of a plurality of sets of available MTS schemes. As described above, each of the sets (also referred to as "groups") may include four MTS schemes. For example, there may be 80 different sets of MTS schemes, as shown by the example g_aucTrSet data structure above. Each of the available MTS scheme sets may include a value representing the MTS scheme, such as an index to 25 possible MTS scheme sets, e.g., as discussed above with respect to the g_auctridxttr data structure.
The video decoder 300 may also determine one MTS scheme (408) of the determined set of MTS schemes to apply to the current block, i.e., the transformed block of the current block. For example, the video decoder 300 may decode a transform index that indicates which of the four MTS schemes of the determined set of MTS schemes is to be applied to the current block.
The video decoder 300 may then apply the determined MTS scheme to the transform block of the current block (410). For example, the video decoder 300 may apply a vertical transform and a horizontal transform of the MTS scheme to the transform blocks. Application of the MTS scheme may result in a reproduced residual block. The video decoder 300 may then use the residual block to decode the current block (412), e.g., by combining the residual block with the prediction block on a sample-by-sample basis.
In this way, the method of fig. 11 represents an example of a method of decoding video data, including: determining a size of a current block of video data; determining an intra prediction mode for a current block of video data; determining a mode group including the determined intra prediction mode, the mode group being one of a plurality of mode groups, each of the mode groups of the plurality of mode groups including a respective set of intra prediction modes, such that each possible intra prediction mode is not included in more than one of the mode groups; determining an available Multiple Transform Selection (MTS) scheme set for the current block according to a size of the current block and an intra-prediction mode, the available MTS scheme set being one of a plurality of MTS scheme sets; determining an MTS scheme from a set of available MTS schemes based on the determined pattern groups; applying a transform of the MTS scheme to a transform block of the current block to generate a residual block of the current block; and decoding the current block using the residual block.
Fig. 12 is a flowchart illustrating an example method of decoding a block of video data in accordance with the techniques of this disclosure. For purposes of example, the method of fig. 12 may be performed by the video decoder 300 (fig. 1 and 8), and explained with respect to the video decoder 300. The video encoder 200 in fig. 1 and 7, as well as other video coding (encoding and/or decoding devices), may be configured to perform this or a similar method. The method of fig. 12 may be performed as part of the method of fig. 9 (e.g., steps 354 and/or 362) or as part of the method of fig. 10 (e.g., step 378). In some examples, the method of fig. 12 may be performed with the method of fig. 11.
Initially, video decoder 300 determines the size of the current block of video data (420). For example, the video decoder 300 may determine the width W and the height H of the current block, where W and H represent the number of samples along the corresponding dimension of the current block. The video decoder 300 may also determine a current intra prediction mode for the current block (422). For example, the video decoder 300 may decode one or more intra prediction mode syntax elements that represent an intra prediction mode of the current block. Alternatively, video decoder 300 may determine the intra-prediction mode using any of the various techniques discussed above with respect to fig. 3-6.
The video decoder 300 may then determine a size set, including the determined intra prediction mode (424). For example, the video decoder 300 may determine the size group according to table 1 above. In other examples, other size groupings may be used, which may include more or fewer groups and/or more or fewer sizes in each group.
The video decoder 300 may then determine the set of available MTS schemes for the current block based on the size set and intra-prediction modes of the current block (426). For example, the video decoder 300 may determine the set of available MTS schemes from table 3 above. That is, the set of available MTS schemes may be one of a plurality of sets of available MTS schemes. As described above, each of the sets (also referred to as "groups") may include four MTS schemes. For example, there may be 80 different sets of MTS schemes, as shown by the example g_aucTrSet data structure above.
The video decoder 300 may also determine one MTS scheme (428) of the determined set of MTS schemes to apply to the current block, i.e., the transformed block of the current block. For example, the video decoder 300 may decode a transform index that indicates which of the four MTS schemes of the determined set of MTS schemes is to be applied to the current block.
The video decoder 300 may then apply the determined MTS scheme to the transform block of the current block (430). For example, the video decoder 300 may apply a vertical transform and a horizontal transform of the MTS scheme to the transform blocks. Application of the MTS scheme may result in a reproduced residual block. The video decoder 300 may then use the residual block to decode the current block (432), e.g., by combining the residual block with the prediction block on a sample-by-sample basis.
Fig. 13 is a flowchart illustrating an example method of decoding a block of video data in accordance with the techniques of this disclosure. For purposes of example, the method of fig. 13 may be performed by the video decoder 300 (fig. 1 and 8), and explained with respect to the video decoder 300. The video encoder 200 in fig. 1 and 7, as well as other video coding (encoding and/or decoding devices), may be configured to perform this or a similar method. The method of fig. 13 may be performed as part of the method of fig. 9 (e.g., steps 354 and/or 362) or as part of the method of fig. 10 (e.g., step 378).
First, the video decoder 300 determines the size WxH of the current block of video data (440). For example, the video decoder 300 may determine the width W and the height H of the current block, where W and H represent the number of samples along the corresponding dimension of the current block. The video decoder 300 may also determine a current intra prediction mode for the current block (442). For example, the video decoder 300 may decode one or more intra prediction mode syntax elements that represent an intra prediction mode of the current block. Alternatively, video decoder 300 may determine the intra-prediction mode using any of the various techniques discussed above with respect to fig. 3-6.
The video decoder 300 may then determine a symmetric size (HxW) and an intra prediction mode (444). In particular, if the actual size of the current block and the intra prediction mode are not included in table 3, the video decoder 300 may determine the MTS scheme using the symmetric size and the intra prediction mode. By simply inverting WxH to HxW, the video decoder 300 can obtain symmetric block sizes. The symmetry of the intra prediction mode may be determined from the mirror images of modes 2 through 34 as shown in fig. 2, assuming that the mirror images are parallel to mode 34 and pass through the upper left and lower right corners of the block in fig. 2. Thus, for example, pattern 66 is symmetrical to pattern 2, pattern 65 is symmetrical to pattern 3, and so on, until pattern 33 is symmetrical to pattern 35 (and pattern 34 will be symmetrical to itself).
The video decoder 300 may then determine the set of available MTS schemes for the current block based on the symmetric block size (HxW) and the symmetric intra prediction mode (446). For example, the video decoder 300 may determine the set of available MTS schemes from table 3 above. That is, the set of available MTS schemes may be one of a plurality of sets of available MTS schemes. As described above, each of the sets (also referred to as "groups") may include four MTS schemes. For example, there may be 80 different sets of MTS schemes, as shown by the example g_aucTrSet data structure above. Each of the available MTS scheme sets may include a value representing the MTS scheme, such as an index to 25 possible MTS scheme sets, e.g., as discussed above with respect to the g_auctridxttr data structure. For example, according to the example of table 3, if the current block has a size of 8x32 and an intra prediction mode of 58, the video decoder 300 may determine that the symmetric size is 32x8, the symmetric intra prediction mode is 10, and the MTS scheme set is the 66 th entry of the g_auctrset data structure.
The video decoder 300 may also determine one MTS scheme (448) of the determined set of MTS schemes to apply to the current block, i.e., the transformed block of the current block. For example, the video decoder 300 may decode a transform index that indicates which of the four MTS schemes of the determined set of MTS schemes is to be applied to the current block.
The video decoder 300 may then apply the determined MTS scheme to the transform block of the current block (450). For example, the video decoder 300 may apply a vertical transform and a horizontal transform of the MTS scheme to the transform blocks. Application of the MTS scheme may result in a reproduced residual block. The video decoder 300 may then use the residual block to decode the current block (452), e.g., by combining the residual block with the prediction block on a sample-by-sample basis.
Various examples of the techniques of the present disclosure are summarized in the following clauses:
clause 1: a method of decoding video data, the method comprising: determining a size of a current block of video data; determining an intra-prediction mode of the current block of video data; determining an available Multiple Transform Selection (MTS) scheme set for the current block according to the size of the current block and the intra-prediction mode, the available MTS scheme set being one of a plurality of MTS scheme sets; determining an MTS scheme from the set of available MTS schemes; applying a transform of the MTS scheme to a transform block of the current block to generate a residual block for the current block; and decoding the current block using the residual block.
Clause 2: the method of clause 1, wherein the size of the current block comprises a size group according to a width of the current block and a height of the current block.
Clause 3: the method of clause 2, wherein the size group of the current block is selected from one of a plurality of size groups including 4x4, 4x8, 4x16, 4xN, 8x4, 8x8, 8x16, 8xN, 16x4, 16x8, 16x16, 16xN, nx4, nx8, nx16, nxN, where N is an integer power of 2 and greater than 16.
Clause 4: the method of clause 3, wherein determining the set of available MTS schemes according to the size of the current block comprises: the set of available MTS is determined according to the size group of the current block.
Clause 5: the method of any of clauses 1-4, wherein determining the intra-prediction mode comprises: determining a mode group including the intra-prediction mode, and wherein determining the set of available MTS schemes from the intra-prediction mode of the current block includes: the set of available MTS is determined from the set of modes for the current block.
Clause 6: the method of clause 5, wherein the pattern group is selected from one of a plurality of pattern groups, the plurality of pattern groups comprising: a first group comprising intra prediction modes 0 and 1, a second group comprising intra prediction modes 2 to 12, a third group comprising intra prediction modes 13 to 23, a fourth group comprising intra prediction modes 24 to 34, and a fifth group comprising Matrix Intra Prediction (MIP) modes.
Clause 7: the method of any of clauses 1-6, further comprising: decoding an MTS index value representing the MTS scheme in the set of available MTS schemes, wherein determining the MTS scheme comprises: the MTS scheme is determined using the MTS index value.
Clause 8: the method of clause 7, wherein the MTS index value has a value between 0 and 3, including 0 and 3, wherein the plurality of MTS scheme sets includes:
{ }, {1,11,17,21}, { }, { }, {, { }, { }, {, { }, {1,11,17,21}, { }, { }, { }, { }, {, { }, {7,8,9,11}, {6,7,11,12}, { }, { }, { }, { }, {6,7,9,12}, { }, { }, { }, {6,7,11,12}, { }, {6,7,11,12}, { } and {6,7,11,12}, and wherein the MTS index indicates a pair of transforms in the set of available MTS schemes according to:
{DCT8,DCT8},{DCT8,DST7},{DCT8,DCT5},{DCT8,DST4},{DCT8,DST1},{DST7,DCT8},{DST7,DST7},{DST7,DCT5},{DST7,DST4},{DST7,DST1},{DCT5,DCT8},{DCT5,DST7},{DCT5,DCT5},{DCT5,DST4},{DCT5,DST1},{DST4,DCT8},{DST4,DST7},{DST4,DCT5},{DST4,DST4},{DST4,DST1},{DST1,DCT8},{DST1,DST7},{DST1,DCT5},{DST1,DST4},{DST1,DST1}.
Clause 9: the method of any of clauses 1-8, wherein each MTS scheme set of the set of MTS schemes comprises four corresponding transform pair selections.
Clause 10: the method of any of clauses 1-9, further comprising: the number of transform pair selections in the set of available MTS schemes is determined based on the shape of the current block.
Clause 11: the method of any of clauses 1-10, further comprising: the number of transform pair selections in the set of available MTS schemes is determined based on quantization parameters of the current block.
Clause 12: the method of any of clauses 1-11, wherein the current block comprises a first block, wherein the MTS scheme comprises a transform pair comprising a horizontal transform and a vertical transform, wherein the first block has a size of WxH, wherein W is not equal to H, wherein the intra-prediction mode comprises I 1 And is an angular intra prediction mode, the method further comprising: determining that the second block has a size of HxW; determining that the second block has (68-I) 1 ) Intra prediction mode of (a); according to the size sum (68-I) of HxW of the second block 1 ) To determine the set of available MTS schemes for the second block; determining the MTS scheme for the second block; applying the horizontal transform of the MTS scheme as a vertical transform to the second block; and applying the vertical transform of the MTS scheme as a horizontal transform to the second block.
Clause 13: the method of any of clauses 1-11, wherein the current block comprises a first block, wherein the MTS scheme comprises a transform pair comprising a horizontal transform and a vertical transform, wherein the first block has a size of WxH, wherein W is not equal to H, wherein the intra-prediction mode comprises a matrix intra-prediction (MIP) mode having a first transition flag value, the method further comprising: determining that the second block has a size of HxW; determining that an intra-prediction mode for the second block is a MIP intra-prediction mode having a second transpose flag value different from the first transpose flag value; determining the set of available MTS schemes for the second block according to a size of an HxW of the second block and the MIP intra-prediction mode having the second transpose flag value; determining the MTS scheme for the second block from the set of available MTS schemes; applying the horizontal transform of the MTS scheme as a vertical transform to the second block; and applying the vertical transform of the MTS scheme as a horizontal transform to the second block.
Clause 14: the method of any of clauses 1-11, wherein determining the set of available MTS schemes from a decoder-side intra-mode derivation and fusion intra-prediction (DIMD) mode when coding the current block comprises: the set of available MTS schemes is determined according to a dominant angular mode determined using the DIMD mode.
Clause 15: the method of clause 14, wherein the dominant angular pattern comprises the pattern with the highest weight.
Clause 16: the method of any of clauses 1-11, wherein determining the set of available MTS schemes from a decoder-side intra-mode derivation and fusion intra-prediction (DIMD) mode when coding the current block comprises: determining whether a difference between the two angular mode values is above a threshold; when the difference is above the threshold, determining the intra-prediction mode comprises: when the set of available MTS schemes is determined, determining the intra-prediction mode as a planar mode; or when the difference is less than or equal to the threshold, determining the intra prediction mode comprises: the intra-prediction mode is determined as a dominant angular mode determined using the DIMD mode.
Clause 17: the method of any of clauses 1-11, wherein when the intra-prediction mode comprises a wide-angle intra-prediction mode, determining the set of available MTS schemes from the intra-prediction mode comprises: the set of available MTS schemes is determined from a conventional intra-prediction mode having an angle closest to an angle of the wide-angle intra-prediction mode.
Clause 18: the method of any of clauses 1-17, wherein determining the set of available MTS schemes according to the size of the current block and the intra-prediction mode comprises: the set of available MTS schemes is determined according to the following table: 2104432CN
Size mode | [0,1] | [2-12] | [13-23] | [24-34] | MIP |
4x4 | 0 | 1 | 2 | 3 | 4 |
4x8 | 5 | 6 | 7 | 8 | 9 |
4x16 | 10 | 11 | 12 | 13 | 14 |
4xN | 15 | 16 | 17 | 18 | 19 |
8x4 | 20 | 21 | 22 | 23 | 24 |
8x8 | 25 | 26 | 27 | 28 | 29 |
8x16 | 30 | 31 | 32 | 33 | 34 |
8xN | 35 | 36 | 37 | 38 | 39 |
16x4 | 40 | 41 | 42 | 43 | 44 |
16x8 | 45 | 46 | 47 | 48 | 49 |
16x16 | 50 | 51 | 52 | 53 | 54 |
16xN | 55 | 56 | 57 | 58 | 59 |
32x4 | 60 | 61 | 62 | 63 | 64 |
32x8 | 65 | 66 | 67 | 68 | 69 |
32x16 | 70 | 71 | 72 | 73 | 74 |
32xN | 75 | 76 | 77 | 78 | 79 |
Where N is an integer value equal to or greater than 32.
Clause 19: the method of any of clauses 1-18, wherein determining the intra-prediction mode comprises: the intra-prediction mode is determined according to a template-based intra-mode derivation (TIMD) mode.
Clause 20: the method of clause 19, wherein when the TIMD mode uses a fusion of two intra-prediction modes, determining the set of available MTS schemes comprises: the set of available MTS schemes is determined from a dominant intra-prediction mode of the two intra-prediction modes.
Clause 21: the method of clause 19, wherein when the TIMD mode uses a fusion of two intra-prediction modes, determining the set of available MTS schemes comprises: when the difference between the two intra-prediction modes is above a threshold, determining the set of available MTS schemes comprises: determining the set of available MTS schemes according to a planar mode; or when the difference between the two intra-prediction modes is less than or equal to the threshold, determining the set of available MTS schemes comprises: the set of available MTS schemes is determined from a dominant intra-prediction mode of the two intra-prediction modes.
Clause 22: the method of any of clauses 20 and 21, wherein the dominant intra-prediction mode comprises the intra-prediction mode of the two intra-prediction modes that produces lower distortion.
Clause 23: the method of any of clauses 19-22, wherein determining the set of available MTS schemes comprises: the set of available MTS schemes is determined from a table that angularly maps extended intra-prediction modes to the set of available MTS schemes.
Clause 24: a method of decoding video data, the method comprising: determining a size of a current block of video data; determining an intra-prediction mode for the current block of video data; determining a mode group including the determined intra prediction mode, the mode group being one of a plurality of mode groups, each of the mode groups of the plurality of mode groups including a respective set of intra prediction modes such that each possible intra prediction mode is not included in more than one of the mode groups; determining an available Multiple Transform Selection (MTS) scheme set for the current block according to the size of the current block and the intra-prediction mode, the available MTS scheme set being one of a plurality of MTS scheme sets; determining an MTS scheme from the set of available MTS schemes based on the determined pattern groups; applying a transform of the MTS scheme to a transform block of the current block to generate a residual block of the current block; and decoding the current block using the residual block.
Clause 25: the method of clause 24, wherein the plurality of pattern groups comprises: a first mode group including intra prediction modes 0 and 1, a second group including intra prediction modes 2 to 12, a third group including intra prediction modes 13 to 23, a fourth group including intra prediction modes 24 to 34, and a fifth group including Matrix Intra Prediction (MIP) mode.
Clause 26: the method of clause 24, wherein the size of the current block includes a width of the current block and a height of the current block, and wherein the size of the current block is included in a size group.
Clause 27: the method of clause 26, wherein the size group of the current block is selected from one of a plurality of size groups including 4x4, 4x8, 4x16, 4xN, 8x4, 8x8, 8x16, 8xN, 16x4, 16x8, 16x16, 16xN, nx4, nx8, nx16, nxN, where N is an integer power of 2 and greater than 16.
Clause 28: the method of clause 27, wherein determining the set of available MTS schemes according to the size of the current block comprises: the set of available MTS is determined according to the size group of the current block.
Clause 29: the method of clause 24, further comprising: decoding an MTS index value representing the MTS scheme in the set of available MTS schemes, wherein determining the MTS scheme comprises: the MTS scheme is determined using the MTS index value.
Clause 30: the method of clause 29, wherein the MTS index value has a value between 0 and 3, including 0 and 3, wherein the plurality of sets of MTS schemes comprises:
{ }, {1,11,17,21}, { }, { }, {, { }, { }, {, { }, {1,11,17,21}, { }, { }, { }, { }, {, { }, {7,8,9,11}, {6,7,11,12}, { }, { }, { }, { }, {6,7,9,12}, { }, { }, { }, {6,7,11,12}, { }, {6,7,11,12}, { } and {6,7,11,12}, and wherein the MTS index indicates a pair of transforms in the set of available MTS schemes according to:
{DCT8,DCT8},{DCT8,DST7},{DCT8,DCT5},{DCT8,DST4},{DCT8,DST1},{DST7,DCT8},{DST7,DST7},{DST7,DCT5},{DST7,DST4},{DST7,DST1},{DCT5,DCT8},{DCT5,DST7},{DCT5,DCT5},{DCT5,DST4},{DCT5,DST1},{DST4,DCT8},{DST4,DST7},{DST4,DCT5},{DST4,DST4},{DST4,DST1},{DST1,DCT8},{DST1,DST7},{DST1,DCT5},{DST1,DST4},{DST1,DST1}.
Clause 31: the method of clause 24, wherein each of the sets of MTS schemes comprises four corresponding transform pair selections.
Clause 32: the method of clause 24, further comprising: the number of transform pair selections in the set of available MTS schemes is determined based on the shape of the current block.
Clause 33: the method of clause 24, further comprising: the number of transform pair selections in the set of available MTS schemes is determined based on quantization parameters of the current block.
Clause 34: the method of clause 24, wherein the current block comprises a first block, wherein the MTS scheme comprises a transform pair comprising a horizontal transform and a vertical transform, wherein the first block has a size of WxH, wherein W is not equal to H, wherein the intra-prediction mode comprises I 1 And is an angular intra prediction mode, the method further comprising: determining that the second block has a size of HxW; determining that the second block has (68)I 1 ) Intra prediction mode of (a); according to the size sum (68-I) of HxW of the second block 1 ) To determine the set of available MTS schemes for the second block; determining the MTS scheme for the second block; applying the horizontal transform of the MTS scheme as a vertical transform to the second block; and applying the vertical transform of the MTS scheme as a horizontal transform to the second block.
Clause 35: the method of clause 24, wherein the current block comprises a first block, wherein the MTS scheme comprises a transform pair comprising a horizontal transform and a vertical transform, wherein the first block has a size of WxH, wherein W is not equal to H, wherein the intra-prediction mode comprises a matrix intra-prediction (MIP) mode having a first index value, the method further comprising: determining that the second block has a size of HxW; determining that an intra-prediction mode for the second block is a MIP intra-prediction mode having a second transpose flag value different from the first transpose flag value; determining the set of available MTS schemes for the second block according to a size of an HxW of the second block and the MIP intra-prediction mode having the second transpose flag value; determining the MTS scheme for the second block from the set of available MTS schemes; applying the horizontal transform of the MTS scheme as a vertical transform to the second block; and applying the vertical transform of the MTS scheme as a horizontal transform to the second block.
Clause 36: the method of clause 24, wherein determining the set of available MTS schemes from the intra-prediction mode when coding the current block using a decoder-side intra-mode derivation and fusion intra-prediction (DIMD) mode comprises: the set of available MTS schemes is determined according to a dominant angular mode determined using the DIMD mode.
Clause 37: the method of clause 14, wherein the dominant angular pattern comprises the pattern with the highest weight.
Clause 38: the method of clause 24, wherein determining the set of available MTS schemes from the intra-prediction mode when coding the current block using a decoder-side intra-mode derivation and fusion intra-prediction (DIMD) mode comprises: determining whether a difference between the two angular mode values is above a threshold; when the difference is above the threshold, determining the intra-prediction mode comprises: when the set of available MTS schemes is determined, determining the intra-prediction mode as a planar mode; or when the difference is less than or equal to the threshold, determining the intra prediction mode comprises: the intra-prediction mode is determined as a dominant angular mode determined using the DIMD mode.
Clause 39: the method of clause 24, wherein when the intra-prediction mode comprises a wide-angle intra-prediction mode, determining the set of available MTS schemes from the intra-prediction mode comprises: the set of available MTS schemes is determined from a conventional intra-prediction mode having an angle closest to an angle of the wide-angle intra-prediction mode.
Clause 40: the method of clause 24, wherein determining the set of available MTS schemes according to the size of the current block and the intra-prediction mode comprises: the set of available MTS schemes is determined according to the following table:
size mode | [0,1] | [2-12] | [13-23] | [24-34] | MIP |
4x4 | 0 | 1 | 2 | 3 | 4 |
4x8 | 5 | 6 | 7 | 8 | 9 |
4x16 | 10 | 11 | 12 | 13 | 14 |
4xN | 15 | 16 | 17 | 18 | 19 |
8x4 | 20 | 21 | 22 | 23 | 24 |
8x8 | 25 | 26 | 27 | 28 | 29 |
8x16 | 30 | 31 | 32 | 33 | 34 |
8xN | 35 | 36 | 37 | 38 | 39 |
16x4 | 40 | 41 | 42 | 43 | 44 |
16x8 | 45 | 46 | 47 | 48 | 49 |
16x16 | 50 | 51 | 52 | 53 | 54 |
16xN | 55 | 56 | 57 | 58 | 59 |
32x4 | 60 | 61 | 62 | 63 | 64 |
32x8 | 65 | 66 | 67 | 68 | 69 |
32x16 | 70 | 71 | 72 | 73 | 74 |
32xN | 75 | 76 | 77 | 78 | 79 |
Where N is an integer value equal to or greater than 32.
Clause 41: the method of clause 24, wherein determining the intra-prediction mode comprises: the intra-prediction mode is determined according to a template-based intra-mode derivation (TIMD) mode.
Clause 42: the method of clause 41, wherein when the TIMD mode uses a fusion of two intra-prediction modes, determining the set of available MTS schemes comprises: the set of available MTS schemes is determined from a dominant intra-prediction mode of the two intra-prediction modes.
Clause 43: the method of clause 41, wherein when the TIMD mode uses a fusion of two intra-prediction modes, determining the set of available MTS schemes comprises: when the difference between the two intra-prediction modes is above a threshold, determining the set of available MTS schemes comprises: determining the set of available MTS schemes according to a planar mode; or when the difference between the two intra-prediction modes is less than or equal to the threshold, determining the set of available MTS schemes comprises: the set of available MTS schemes is determined from a dominant intra-prediction mode of the two intra-prediction modes.
Clause 44: the method of clause 43, wherein the dominant intra-prediction mode comprises the intra-prediction mode of the two intra-prediction modes that produces lower distortion.
Clause 45: the method of clause 43, wherein determining the set of available MTS schemes comprises: the set of available MTS schemes is determined from a table that angularly maps extended intra-prediction modes to the set of available MTS schemes.
Clause 46: the method of clause 24, wherein decoding the current block comprises: forming a prediction block for the current block using the intra prediction mode; and adding samples of the prediction block with corresponding samples of the residual block.
Clause 47: the method of clause 24, further comprising: the current block is encoded prior to decoding the current block.
Clause 48: an apparatus for decoding video data, the apparatus comprising: a memory configured to store video data; and one or more processors implemented in the circuitry and configured to: determining a size of a current block of video data; determining an intra-prediction mode for the current block of video data; determining a mode group including the determined intra prediction mode, the mode group being one of a plurality of mode groups, each of the mode groups of the plurality of mode groups including a respective set of intra prediction modes such that each possible intra prediction mode is not included in more than one of the mode groups; determining an available Multiple Transform Selection (MTS) scheme set for the current block according to the size of the current block and the intra-prediction mode, the available MTS scheme set being one of a plurality of MTS scheme sets; determining an MTS scheme from the set of available MTS schemes based on the determined pattern groups; applying a transform of the MTS scheme to a transform block of the current block to generate a residual block of the current block; and decoding the current block using the residual block.
Clause 49: the apparatus of clause 48, wherein the plurality of pattern groups comprises: a first mode group including intra prediction modes 0 and 1, a second group including intra prediction modes 2 to 12, a third group including intra prediction modes 13 to 23, a fourth group including intra prediction modes 24 to 34, and a fifth group including Matrix Intra Prediction (MIP) mode.
Clause 50: the apparatus of clause 48, wherein the size of the current block includes a width of the current block and a height of the current block, and wherein the size of the current block is included in a size group.
Clause 51: the apparatus of clause 48, wherein the one or more processors are further configured to: decoding MTS index values representing the MTS schemes in the set of available MTS schemes, and wherein the one or more processors are configured to: the MTS scheme is determined using the MTS index value.
Clause 52: the apparatus according to clause 48Wherein the current block comprises a first block, wherein the MTS scheme comprises a transform pair comprising a horizontal transform and a vertical transform, wherein the first block has a size of WxH, wherein W is not equal to H, wherein the intra-prediction mode comprises I 1 And is an angular intra prediction mode, and wherein the one or more processors are further configured to: determining that the second block has a size of HxW; determining that the second block has (68-I) 1 ) Intra prediction mode of (a); according to the size sum (68-I) of HxW of the second block 1 ) To determine the set of available MTS schemes for the second block; determining the MTS scheme for the second block; applying the horizontal transform of the MTS scheme as a vertical transform to the second block; and applying the vertical transform of the MTS scheme as a horizontal transform to the second block.
Clause 53: the apparatus of clause 48, wherein the current block comprises a first block, wherein the MTS scheme comprises a transform pair comprising a horizontal transform and a vertical transform, wherein the first block has a size of WxH, wherein W is not equal to H, wherein the intra-prediction mode comprises a matrix intra-prediction (MIP) mode having a first transition flag value, and wherein the one or more processors are further configured to: determining that the second block has a size of HxW; determining that an intra-prediction mode for the second block is a MIP intra-prediction mode having a second transpose flag value different from the first transpose flag value; determining the set of available MTS schemes for the second block according to a size of an HxW of the second block and the MIP intra-prediction mode having the second transpose flag value; determining the MTS scheme for the second block from the set of available MTS schemes; applying the horizontal transform of the MTS scheme as a vertical transform to the second block; and applying the vertical transform of the MTS scheme as a horizontal transform to the second block.
Clause 54: the apparatus of clause 48, wherein the one or more processors are further configured to: the current block is encoded prior to decoding the current block.
Clause 55: the apparatus of clause 48, further comprising: a display configured to display the decoded video data.
Clause 56: the device of clause 48, wherein the device comprises one or more of a camera, a computer, a mobile device, a broadcast receiver device, or a set-top box.
Clause 57: a computer-readable storage medium having instructions stored thereon that, when executed, cause a processor of a device for decoding video data to: determining a size of a current block of video data; determining an intra-prediction mode for the current block of video data; determining a mode group including the determined intra prediction mode, the mode group being one of a plurality of mode groups, each of the mode groups of the plurality of mode groups including a respective set of intra prediction modes such that each possible intra prediction mode is not included in more than one of the mode groups; determining an available Multiple Transform Selection (MTS) scheme set for the current block according to the size of the current block and the intra-prediction mode, the available MTS scheme set being one of a plurality of MTS scheme sets; determining an MTS scheme from the set of available MTS schemes based on the determined pattern groups; applying a transform of the MTS scheme to a transform block of the current block to generate a residual block of the current block; and decoding the current block using the residual block.
Clause 58: the computer-readable storage medium of clause 57, wherein the plurality of pattern groups comprises: a first mode group including intra prediction modes 0 and 1, a second group including intra prediction modes 2 to 12, a third group including intra prediction modes 13 to 23, a fourth group including intra prediction modes 24 to 34, and a fifth group including Matrix Intra Prediction (MIP) mode.
Clause 59: the computer-readable storage medium of clause 57, wherein the size of the current block comprises a width of the current block and a height of the current block, and wherein the size of the current block is included in a size group.
Clause 60: the computer-readable storage medium of clause 57, further comprising instructions that cause the processor to: decoding an MTS index value representing the MTS scheme in the set of available MTS schemes, and wherein the instructions that cause the processor to determine the MTS scheme include instructions that cause the processor to determine the MTS scheme using the MTS index value.
Clause 61: the apparatus of clause 48, wherein the current block comprises a first block, wherein the MTS scheme comprises a transform pair comprising a horizontal transform and a vertical transform, wherein the first block has a size of WxH, wherein W is not equal to H, wherein the intra-prediction mode comprises I 1 And is an angular intra prediction mode, further comprising instructions that cause the processor to: determining that the second block has a size of HxW; determining that the second block has (68-I) 1 ) Intra prediction mode of (a); according to the size sum (68-I) of HxW of the second block 1 ) To determine the set of available MTS schemes for the second block; determining the MTS scheme for the second block; applying the horizontal transform of the MTS scheme as a vertical transform to the second block; and applying the vertical transform of the MTS scheme as a horizontal transform to the second block.
Clause 62: the apparatus of clause 48, wherein the current block comprises a first block, wherein the MTS scheme comprises a transform pair comprising a horizontal transform and a vertical transform, wherein the first block has a size of WxH, wherein W is not equal to H, wherein the intra-prediction mode comprises a matrix intra-prediction (MIP) mode having a first index value, and further comprising instructions that cause the processor to: determining that the second block has a size of HxW; determining that an intra-prediction mode for the second block is a MIP intra-prediction mode having a second transpose flag value different from the first transpose flag value; determining the set of available MTS schemes for the second block according to a size of an HxW of the second block and the MIP intra-prediction mode having the second transpose flag value; determining the MTS scheme for the second block from the set of available MTS schemes; applying the horizontal transform of the MTS scheme as a vertical transform to the second block; and applying the vertical transform of the MTS scheme as a horizontal transform to the second block.
Clause 63: the apparatus of clause 48, further comprising instructions that cause the processor to: the current block is encoded prior to the current block.
Clause 64: an apparatus for decoding video data, the apparatus comprising: means for determining a size of a current block of video data; means for determining an intra prediction mode for the current block of video data; means for determining a mode group comprising the determined intra prediction modes, the mode group being one of a plurality of mode groups, each of the mode groups comprising a respective set of intra prediction modes, such that each possible intra prediction mode is not included in more than one of the mode groups; means for determining a set of available Multiple Transform Selection (MTS) schemes for the current block based on the size of the current block and the intra-prediction mode, the set of available MTS schemes being one of a plurality of sets of MTS schemes; means for determining an MTS scheme from the set of available MTS schemes based on the determined pattern groups; means for applying a transform of the MTS scheme to a transform block of the current block to generate a residual block of the current block; and means for decoding the current block using the residual block.
Clause 65: a method of decoding video data, the method comprising: determining a size of a current block of video data; determining an intra-prediction mode for the current block of video data; determining a mode group including the determined intra prediction mode, the mode group being one of a plurality of mode groups, each of the mode groups of the plurality of mode groups including a respective set of intra prediction modes such that each possible intra prediction mode is not included in more than one of the mode groups; determining an available Multiple Transform Selection (MTS) scheme set for the current block according to the size of the current block and the intra-prediction mode, the available MTS scheme set being one of a plurality of MTS scheme sets; determining an MTS scheme from the set of available MTS schemes based on the determined pattern groups; applying a transform of the MTS scheme to a transform block of the current block to generate a residual block of the current block; and decoding the current block using the residual block.
Clause 66: the method of clause 65, wherein the plurality of pattern groups comprises: a first mode group including intra prediction modes 0 and 1, a second group including intra prediction modes 2 to 12, a third group including intra prediction modes 13 to 23, a fourth group including intra prediction modes 24 to 34, and a fifth group including Matrix Intra Prediction (MIP) mode.
Clause 67: the method of any of clauses 65 and 66, wherein the size of the current block includes a width of the current block and a height of the current block, and wherein the size of the current block is included in a size group.
Clause 68: the method of clause 67, wherein the size group of the current block is selected from one of a plurality of size groups including 4x4, 4x8, 4x16, 4xN, 8x4, 8x8, 8x16, 8xN, 16x4, 16x8, 16x16, 16xN, nx4, nx8, nx16, nxN, where N is an integer power of 2 and greater than 16.
Clause 69: the method of clause 68, wherein determining the set of available MTS schemes according to the size of the current block comprises: the set of available MTS is determined according to the size group of the current block.
Clause 70: the method of any of clauses 65-69, further comprising: decoding an MTS index value representing the MTS scheme in the set of available MTS schemes, wherein determining the MTS scheme comprises: the MTS scheme is determined using the MTS index value.
Clause 71: the method of clause 70, wherein the MTS index value has a value between 0 and 3, including 0 and 3, wherein the plurality of MTS scheme sets comprises:
{ }, {1,11,17,21}, { }, { }, {, { }, { }, {, { }, {1,11,17,21}, { }, { }, { }, { }, {, { }, {7,8,9,11}, {6,7,11,12}, { }, { }, { }, { }, {6,7,9,12}, { }, { }, { }, {6,7,11,12}, { }, {6,7,11,12}, { } and {6,7,11,12}, and wherein the MTS index indicates a pair of transforms in the set of available MTS schemes according to:
{DCT8,DCT8},{DCT8,DST7},{DCT8,DCT5},{DCT8,DST4},{DCT8,DST1},{DST7,DCT8},{DST7,DST7},{DST7,DCT5},{DST7,DST4},{DST7,DST1},{DCT5,DCT8},{DCT5,DST7},{DCT5,DCT5},{DCT5,DST4},{DCT5,DST1},{DST4,DCT8},{DST4,DST7},{DST4,DCT5},{DST4,DST4},{DST4,DST1},{DST1,DCT8},{DST1,DST7},{DST1,DCT5},{DST1,DST4},{DST1,DST1}.
Clause 72: the method of any of clauses 65-71, wherein each of the sets of MTS schemes comprises four corresponding transform pair selections.
Clause 73: the method of any of clauses 65-72, further comprising: the number of transform pair selections in the set of available MTS schemes is determined based on the shape of the current block.
Clause 74: the method of any of clauses 65-73, further comprising: the number of transform pair selections in the set of available MTS schemes is determined based on quantization parameters of the current block.
Clause 75: the method of any of clauses 65-74, wherein the current block comprises a first block, wherein the MTS scheme comprises a transform pair comprising a horizontal transform and a vertical transform, wherein the first block has a size of WxH, wherein W is not equal to H, wherein the intra-prediction mode comprises I 1 And is an angular intra prediction mode, the method further comprising: determining that the second block has a size of HxW; determining that the second block has (68-I) 1 ) Intra prediction mode of (a); according to the size sum (68-I) of HxW of the second block 1 ) To determine the set of available MTS schemes for the second block; determining the MTS scheme for the second block; applying the horizontal transform of the MTS scheme as a vertical transform to the second block; and applying the vertical transform of the MTS scheme as a horizontal transform to the second block.
Clause 76: the method of any of clauses 65-74, wherein the current block comprises a first block, wherein the MTS scheme comprises a transform pair comprising a horizontal transform and a vertical transform, wherein the first block has a size of WxH, wherein W is not equal to H, wherein the intra-prediction mode comprises a matrix intra-prediction (MIP) mode having a first transition flag value, the method further comprising: determining that the second block has a size of HxW; determining that an intra-prediction mode for the second block is a MIP intra-prediction mode having a second transpose flag value different from the first transpose flag value; determining the set of available MTS schemes for the second block according to a size of an HxW of the second block and the MIP intra-prediction mode having the second transpose flag value; determining the MTS scheme for the second block from the set of available MTS schemes; applying the horizontal transform of the MTS scheme as a vertical transform to the second block; and applying the vertical transform of the MTS scheme as a horizontal transform to the second block.
Clause 77: the method of any of clauses 65-76, wherein determining the set of available MTS schemes from a decoder-side intra-mode derivation and fusion intra-prediction (DIMD) mode when coding the current block comprises: the set of available MTS schemes is determined according to a dominant angular mode determined using the DIMD mode.
Clause 78: the method of clause 77, wherein the dominant angular pattern comprises the pattern with the highest weight.
Clause 79: the method of any of clauses 65-78, wherein determining the set of available MTS schemes from a decoder-side intra-mode derivation and fusion intra-prediction (DIMD) mode when coding the current block comprises: determining whether a difference between the two angular mode values is above a threshold; when the difference is above the threshold, determining the intra-prediction mode comprises: when the set of available MTS schemes is determined, determining the intra-prediction mode as a planar mode; or when the difference is less than or equal to the threshold, determining the intra prediction mode comprises: the intra-prediction mode is determined as a dominant angular mode determined using the DIMD mode.
Clause 80: the method of any of clauses 65-79, wherein when the intra-prediction mode comprises a wide-angle intra-prediction mode, determining the set of available MTS schemes from the intra-prediction mode comprises: the set of available MTS schemes is determined from a conventional intra-prediction mode having an angle closest to an angle of the wide-angle intra-prediction mode.
Clause 81: the method of any of clauses 65-80, wherein determining the set of available MTS schemes from the size of the current block and the intra-prediction mode comprises: the set of available MTS schemes is determined according to the following table:
where N is an integer value equal to or greater than 32.
Clause 82: the method of any of clauses 65-81, wherein determining the intra-prediction mode comprises: the intra-prediction mode is determined according to a template-based intra-mode derivation (TIMD) mode.
Clause 83: the method of clause 82, wherein when the TIMD mode uses a fusion of two intra-prediction modes, determining the set of available MTS schemes comprises: the set of available MTS schemes is determined from a dominant intra-prediction mode of the two intra-prediction modes.
Clause 84: the method of clause 82, wherein when the TIMD mode uses a fusion of two intra-prediction modes, determining the set of available MTS schemes comprises: when the difference between the two intra-prediction modes is above a threshold, determining the set of available MTS schemes comprises: determining the set of available MTS schemes according to a planar mode; or when the difference between the two intra-prediction modes is less than or equal to the threshold, determining the set of available MTS schemes comprises: the set of available MTS schemes is determined from a dominant intra-prediction mode of the two intra-prediction modes.
Clause 85: the method of clause 84, wherein the dominant intra-prediction mode comprises the intra-prediction mode of the two intra-prediction modes that produces lower distortion.
Clause 86: the method of any one of clauses 84 and 85, wherein determining the set of available MTS schemes comprises: the set of available MTS schemes is determined from a table that angularly maps extended intra-prediction modes to the set of available MTS schemes.
Clause 87: the method of any of clauses 65-86, wherein decoding the current block comprises: forming a prediction block for the current block using the intra prediction mode; and adding samples of the prediction block with corresponding samples of the residual block.
Clause 88: the method of any of clauses 65-87, further comprising: the current block is encoded prior to decoding the current block.
Clause 89: an apparatus for decoding video data, the apparatus comprising: a memory configured to store video data; and one or more processors implemented in the circuitry and configured to: determining a size of a current block of video data; determining an intra-prediction mode for the current block of video data; determining a mode group including the determined intra prediction mode, the mode group being one of a plurality of mode groups, each of the mode groups of the plurality of mode groups including a respective set of intra prediction modes such that each possible intra prediction mode is not included in more than one of the mode groups; determining an available Multiple Transform Selection (MTS) scheme set for the current block according to the size of the current block and the intra-prediction mode, the available MTS scheme set being one of a plurality of MTS scheme sets; determining an MTS scheme from the set of available MTS schemes based on the determined pattern groups; applying a transform of the MTS scheme to a transform block of the current block to generate a residual block of the current block; and decoding the current block using the residual block.
Clause 90: the apparatus of clause 89, wherein the plurality of pattern groups comprises: a first mode group including intra prediction modes 0 and 1, a second group including intra prediction modes 2 to 12, a third group including intra prediction modes 13 to 23, a fourth group including intra prediction modes 24 to 34, and a fifth group including Matrix Intra Prediction (MIP) mode.
Clause 91: the apparatus of any of clauses 89 and 90, wherein the size of the current block includes a width of the current block and a height of the current block, and wherein the size of the current block is included in a size group.
Clause 92: the device of any of clauses 89-91, wherein the one or more processors are further configured to: decoding MTS index values representing the MTS schemes in the set of available MTS schemes, and wherein the one or more processors are configured to: the MTS scheme is determined using the MTS index value.
Clause 93: the apparatus of any of clauses 89-92, wherein the current block comprises a first block, wherein the MTS scheme comprises a transform pair comprising a horizontal transform and a vertical transform, wherein the first block has a size of WxH, wherein W is not equal to H, wherein the intra-prediction mode comprises I 1 And is an angular intra prediction mode, and wherein the one or more processors are further configured to: determining that the second block has a size of HxW; determining that the second block has (68-I) 1 ) Intra prediction mode of (a); according to the size sum (68-I) of HxW of the second block 1 ) To determine the set of available MTS schemes for the second block; determining the MTS scheme for the second block; applying the horizontal transform of the MTS scheme as a vertical transform to the second block; and applying the vertical transform of the MTS scheme as a horizontal transform to the second block.
Clause 94: the apparatus of any of clauses 89-93, wherein the current block comprises a first block, wherein the MTS scheme comprises a transform pair comprising a horizontal transform and a vertical transform, wherein the first block has a size of WxH, wherein W is not equal to H, wherein the intra-prediction mode comprises a matrix intra-prediction (MIP) mode having a first transition flag value, and wherein the one or more processors are further configured to: determining that the second block has a size of HxW; determining that an intra-prediction mode for the second block is a MIP intra-prediction mode having a second transpose flag value different from the first transpose flag value; determining the set of available MTS schemes for the second block according to a size of an HxW of the second block and the MIP intra-prediction mode having the second transpose flag value; determining the MTS scheme for the second block from the set of available MTS schemes; applying the horizontal transform of the MTS scheme as a vertical transform to the second block; and applying the vertical transform of the MTS scheme as a horizontal transform to the second block.
Clause 95: the device of any of clauses 89-94, wherein the one or more processors are further configured to: the current block is encoded prior to decoding the current block.
Clause 96: the apparatus of any of clauses 89-95, further comprising: a display configured to display the decoded video data.
Clause 97: the device of any of clauses 89-96, wherein the device comprises one or more of a camera, a computer, a mobile device, a broadcast receiver device, or a set top box.
Clause 98: a computer-readable storage medium having instructions stored thereon that, when executed, cause a processor of a device for decoding video data to: determining a size of a current block of video data; determining an intra-prediction mode for the current block of video data; determining a mode group including the determined intra prediction mode, the mode group being one of a plurality of mode groups, each of the mode groups of the plurality of mode groups including a respective set of intra prediction modes such that each possible intra prediction mode is not included in more than one of the mode groups; determining an available Multiple Transform Selection (MTS) scheme set for the current block according to the size of the current block and the intra-prediction mode, the available MTS scheme set being one of a plurality of MTS scheme sets; determining an MTS scheme from the set of available MTS schemes based on the determined pattern groups; applying a transform of the MTS scheme to a transform block of the current block to generate a residual block of the current block; and decoding the current block using the residual block.
Clause 99: the computer-readable storage medium of clause 98, wherein the plurality of pattern groups comprises: a first mode group including intra prediction modes 0 and 1, a second group including intra prediction modes 2 to 12, a third group including intra prediction modes 13 to 23, a fourth group including intra prediction modes 24 to 34, and a fifth group including Matrix Intra Prediction (MIP) mode.
Clause 100: the computer-readable storage medium of any of clauses 98 and 99, wherein the size of the current block comprises a width of the current block and a height of the current block, and wherein the size of the current block is included in a size group.
Clause 101: the computer-readable storage medium of any one of clauses 98-100, further comprising instructions that cause the processor to: decoding an MTS index value representing the MTS scheme in the set of available MTS schemes, and wherein the instructions that cause the processor to determine the MTS scheme include instructions that cause the processor to determine the MTS scheme using the MTS index value.
Clause 102: the apparatus of any of clauses 98-101, wherein the current block comprises a first block, wherein the MTS scheme comprises a transform pair comprising a horizontal transform and a vertical transform, wherein the first block has a size of WxH, wherein W is not equal to H, wherein the intra-prediction mode comprises I 1 And is an angular intra prediction mode, further comprising instructions that cause the processor to: determining that the second block has a size of HxW; determining that the second block has (68-I) 1 ) Intra prediction mode of (a); according to the size sum (68-I) of HxW of the second block 1 ) To determine the intra prediction mode for the second blockA set of available MTS schemes; determining the MTS scheme for the second block; applying the horizontal transform of the MTS scheme as a vertical transform to the second block; and applying the vertical transform of the MTS scheme as a horizontal transform to the second block.
Clause 103: the apparatus of any of clauses 98-102, wherein the current block comprises a first block, wherein the MTS scheme comprises a transform pair comprising a horizontal transform and a vertical transform, wherein the first block has a size of WxH, wherein W is not equal to H, wherein the intra-prediction mode comprises a matrix intra-prediction (MIP) mode having a first transition flag value, and further comprising instructions that cause the processor to: determining that the second block has a size of HxW; determining that an intra-prediction mode for the second block is a MIP intra-prediction mode having a second transpose flag value different from the first transpose flag value; determining the set of available MTS schemes for the second block according to a size of an HxW of the second block and the MIP intra-prediction mode having the second transpose flag value; determining the MTS scheme for the second block from the set of available MTS schemes; applying the horizontal transform of the MTS scheme as a vertical transform to the second block; and applying the vertical transform of the MTS scheme as a horizontal transform to the second block.
Clause 104: the apparatus of any of clauses 98-103, further comprising instructions that cause the processor to: the current block is encoded prior to the current block.
Clause 105: an apparatus for decoding video data, the apparatus comprising: means for determining a size of a current block of video data; means for determining an intra prediction mode for the current block of video data; means for determining a mode group comprising the determined intra prediction modes, the mode group being one of a plurality of mode groups, each of the mode groups comprising a respective set of intra prediction modes, such that each possible intra prediction mode is not included in more than one of the mode groups; means for determining a set of available Multiple Transform Selection (MTS) schemes for the current block based on the size of the current block and the intra-prediction mode, the set of available MTS schemes being one of a plurality of sets of MTS schemes; means for determining an MTS scheme from the set of available MTS schemes based on the determined pattern groups; means for applying a transform of the MTS scheme to a transform block of the current block to generate a residual block of the current block; and means for decoding the current block using the residual block.
It is to be appreciated that certain acts or events of any of the techniques described herein can be performed in a different order, may be added, combined, or omitted entirely, depending on the example (e.g., not all of the described acts or events are necessary to implement the techniques). Further, in some examples, an action or event may be performed concurrently (e.g., by multi-threaded processing, interrupt processing, or multiple processors) rather than sequentially.
In one or more examples, the described functionality may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium, and executed by a hardware-based processing unit. A computer-readable medium may include a computer-readable storage medium (which corresponds to a tangible medium such as a data storage medium) or a communication medium including any medium that facilitates transfer of a computer program from one place to another, for example, according to a communication protocol. In this manner, a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium such as a signal or carrier wave. Data storage media can be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this disclosure. The computer program product may include a computer-readable medium.
By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital Subscriber Line (DSL), or wireless technologies (e.g., infrared, radio, and microwave), then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies (e.g., infrared, radio, and microwave) are included in the definition of medium. However, it should be understood that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but instead are directed to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the terms "processor" and "processing circuitry" as used herein may refer to any one of the foregoing structures or any other structure suitable for implementation of the techniques described herein. Additionally, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated into a combined codec. Furthermore, the techniques may be implemented entirely in one or more circuits or logic elements.
The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses including a wireless handset, an Integrated Circuit (IC), or a set of ICs (e.g., a chipset). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques but do not necessarily require realization by different hardware units. Rather, as described above, the various units may be combined in a codec hardware unit or provided by a collection of interoperable hardware units (including one or more processors as described above) in combination with appropriate software and/or firmware.
Various examples have been described. These and other examples are within the scope of the following claims.
Claims (34)
1. A method of decoding video data, the method comprising:
determining a size of a current block of video data;
determining an intra-prediction mode for the current block of video data;
determining a mode group including the determined intra prediction mode, the mode group being one of a plurality of mode groups, each of the mode groups of the plurality of mode groups including a respective set of intra prediction modes such that each possible intra prediction mode is not included in more than one of the mode groups;
determining an available Multiple Transform Selection (MTS) scheme set for the current block according to the size of the current block and the intra-prediction mode, the available MTS scheme set being one of a plurality of MTS scheme sets;
determining an MTS scheme from the set of available MTS schemes based on the determined pattern groups;
applying a transform of the MTS scheme to a transform block of the current block to generate a residual block of the current block; and
the current block is decoded using the residual block.
2. The method of claim 1, wherein the plurality of pattern groups comprises: a first mode group including intra prediction modes 0 and 1, a second group including intra prediction modes 2 to 12, a third group including intra prediction modes 13 to 23, a fourth group including intra prediction modes 24 to 34, and a fifth group including Matrix Intra Prediction (MIP) mode.
3. The method of claim 1, wherein the size of the current block comprises a width of the current block and a height of the current block, and wherein the size of the current block is included in a size group.
4. The method of claim 3, wherein the size group of the current block is selected from one of a plurality of size groups including 4x4, 4x8, 4x16, 4xN, 8x4, 8x8, 8xN, 16x4, 16x8, 16x16, 16xN, nx4, nx8, nx16, nxN, where N is an integer power of 2 and greater than 16.
5. The method of claim 4, wherein determining the set of available MTS schemes according to the size of the current block comprises: the set of available MTS is determined according to the size group of the current block.
6. The method of claim 1, further comprising: decoding an MTS index value representing the MTS scheme in the set of available MTS schemes, wherein determining the MTS scheme comprises: the MTS scheme is determined using the MTS index value.
7. The method of claim 6, wherein the MTS index value has a value between 0 and 3, including 0 and 3, wherein the plurality of MTS scheme sets comprises:
{17,18,23,24},
{3,7,18,22},
{2,17,18,22},
{3,15,17,18},
{3,12,18,19},
{12,18,19,23},
{2,12,17,18},
{2,17,18,22},
{2,11,17,18},
{12,18,19,23},
{12,13,16,24},
{2,11,16,23},{2,13,17,22},{2,11,17,21},{13,16,19,22},{7,12,13,18},{1,11,12,16},{3,13,17,22},{1,6,12,22},{12,13,15,16},{18,19,23,24},{2,17,18,24},{3,4,17,22},{12,18,19,23},{12,18,19,23},{6,12,18,24},{2,6,12,21},{1,11,17,22},{3,11,16,17},{8,12,19,23},{7,13,16,23},{1,6,11,12},{1,11,17,21},{6,11,17,21},{8,11,14,17},{6,11,12,21},{1,6,11,12},{2,6,11,12},{1,6,11,21},{7,11,12,16},{8,12,19,24},{1,13,18,22},{2,6,17,21},{11,12,16,19},{8,12,17,24},{6,12,19,21},{6,12,13,21},{2,16,17,21},{6,17,19,23},{6,12,14,17},{6,7,11,21},{1,11,12,16},{1,6,11,12},{6,11,12,21},{7,8,9,11},
{6,7,11,12},
{6,7,11,12},
{1,11,12,16},
{6,11,17,21},
{6,7,11,12},
{12,14,18,21},
{1,11,16,22},
{1,11,16,22},
{7,13,15,16},
{1,8,12,19},
{6,7,9,12},
{2,6,12,13},
{1,12,16,21},
{7,11,16,19},
{7,8,11,12},
{6,7,11,12},
{6,7,11,12},
{1,6,11,12},
{6,7,11,16},
{6,7,11,12},
{6,7,11,12},
{6,11,12,21},
{1,6,11,12},
{6,7,11,12},
{6,7,11,12},
and wherein the MTS index indicates transform pairs in the set of available MTS schemes according to:
{DCT8,DCT8},{DCT8,DST7},{DCT8,DCT5},{DCT8,DST4},{DCT8,DST1},
{DST7,DCT8},{DST7,DST7},{DST7,DCT5},{DST7,DST4},{DST7,DST1},
{DCT5,DCT8},{DCT5,DST7},{DCT5,DCT5},{DCT5,DST4},{DCT5,DST1},
{DST4,DCT8},{DST4,DST7},{DST4,DCT5},{DST4,DST4},{DST4,DST1},
{DST1,DCT8},{DST1,DST7},{DST1,DCT5},{DST1,DST4},{DST1,DST1}。
8. the method of claim 1, wherein each of the sets of MTS schemes comprises four corresponding transform pair selections.
9. The method of claim 1, further comprising: the number of transform pair selections in the set of available MTS schemes is determined based on the shape of the current block.
10. The method of claim 1, further comprising: the number of transform pair selections in the set of available MTS schemes is determined based on quantization parameters of the current block.
11. The method of claim 1, wherein the current block comprises a first block, wherein the MTS scheme comprises a transform pair comprising a horizontal transform and a vertical transform, wherein the first block has a size of WxH, wherein W is not equal to H, wherein the intra-prediction mode comprises I 1 And is an angular intra prediction mode, the method further comprising:
determining that the second block has a size of HxW;
determining that the second block has (68-I) 1 ) Intra prediction mode of (a);
according to the size sum (68-I) of HxW of the second block 1 ) To determine the set of available MTS schemes for the second block;
determining the MTS scheme for the second block;
applying the horizontal transform of the MTS scheme as a vertical transform to the second block; and
the vertical transform of the MTS scheme is applied to the second block as a horizontal transform.
12. The method of claim 1, wherein the current block comprises a first block, wherein the MTS scheme comprises a transform pair comprising a horizontal transform and a vertical transform, wherein the first block has a size of WxH, wherein W is not equal to H, wherein the intra-prediction mode comprises a matrix intra-prediction (MIP) mode having a first transition flag value, the method further comprising:
determining that the second block has a size of HxW;
determining that an intra-prediction mode for the second block is a MIP intra-prediction mode having a second transpose flag value different from the first transpose flag value;
Determining the set of available MTS schemes for the second block according to a size of an HxW of the second block and the MIP intra-prediction mode having the second transpose flag value;
determining the MTS scheme for the second block from the set of available MTS schemes;
applying the horizontal transform of the MTS scheme as a vertical transform to the second block; and
the vertical transform of the MTS scheme is applied to the second block as a horizontal transform.
13. The method of claim 1, wherein determining the set of available MTS schemes from the intra-prediction mode when coding the current block using a decoder-side intra-mode derivation and fusion intra-prediction (DIMD) mode comprises: the set of available MTS schemes is determined according to a dominant angular mode determined using the DIMD mode.
14. The method of claim 13, wherein the dominant angular pattern comprises a pattern with highest weight.
15. The method of claim 1, wherein determining the set of available MTS schemes from the intra-prediction mode when coding the current block using a decoder-side intra-mode derivation and fusion intra-prediction (DIMD) mode comprises:
Determining whether a difference between the two angular mode values is above a threshold;
when the difference is above the threshold, determining the intra-prediction mode comprises: when the set of available MTS schemes is determined, determining the intra-prediction mode as a planar mode; or alternatively
When the difference is less than or equal to the threshold, determining the intra-prediction mode includes: the intra-prediction mode is determined as a dominant angular mode determined using the DIMD mode.
16. The method of claim 1, wherein when the intra-prediction mode comprises a wide-angle intra-prediction mode, determining the set of available MTS schemes from the intra-prediction mode comprises: the set of available MTS schemes is determined from a conventional intra-prediction mode having an angle closest to an angle of the wide-angle intra-prediction mode.
17. The method of claim 1, wherein determining the set of available MTS schemes according to the size of the current block and the intra-prediction mode comprises: the set of available MTS schemes is determined according to the following table:
where N is an integer value equal to or greater than 32.
18. The method of claim 1, wherein determining the intra-prediction mode comprises: the intra-prediction mode is determined according to a template-based intra-mode derivation (TIMD) mode.
19. The method of claim 18, wherein determining the set of available MTS schemes when the TIMD mode uses a fusion of two intra-prediction modes comprises: the set of available MTS schemes is determined from a dominant intra-prediction mode of the two intra-prediction modes.
20. The method of claim 18, wherein determining the set of available MTS schemes when the TIMD mode uses a fusion of two intra-prediction modes comprises:
when the difference between the two intra-prediction modes is above a threshold, determining the set of available MTS schemes comprises: determining the set of available MTS schemes according to a planar mode; or alternatively
When the difference between the two intra-prediction modes is less than or equal to the threshold, determining the set of available MTS schemes comprises: the set of available MTS schemes is determined from a dominant intra-prediction mode of the two intra-prediction modes.
21. The method of claim 20, wherein the dominant intra-prediction mode comprises the intra-prediction mode of the two intra-prediction modes that produces lower distortion.
22. The method of claim 20, wherein determining the set of available MTS schemes comprises: the set of available MTS schemes is determined from a table that angularly maps extended intra-prediction modes to the set of available MTS schemes.
23. The method of claim 1, wherein decoding the current block comprises:
forming a prediction block for the current block using the intra prediction mode; and
the samples of the prediction block are added to the corresponding samples of the residual block.
24. The method of claim 1, further comprising: the current block is encoded prior to decoding the current block.
25. An apparatus for decoding video data, the apparatus comprising:
a memory configured to store video data; and
one or more processors implemented in circuitry and configured to:
determining a size of a current block of video data;
determining an intra-prediction mode for the current block of video data;
determining a mode group including the determined intra prediction mode, the mode group being one of a plurality of mode groups, each of the mode groups of the plurality of mode groups including a respective set of intra prediction modes such that each possible intra prediction mode is not included in more than one of the mode groups;
determining an available Multiple Transform Selection (MTS) scheme set for the current block according to the size of the current block and the intra-prediction mode, the available MTS scheme set being one of a plurality of MTS scheme sets;
Determining an MTS scheme from the set of available MTS schemes based on the determined pattern groups;
applying a transform of the MTS scheme to a transform block of the current block to generate a residual block of the current block; and
the current block is decoded using the residual block.
26. The device of claim 25, wherein the plurality of mode groups comprise: a first mode group including intra prediction modes 0 and 1, a second group including intra prediction modes 2 to 12, a third group including intra prediction modes 13 to 23, a fourth group including intra prediction modes 24 to 34, and a fifth group including Matrix Intra Prediction (MIP) mode.
27. The apparatus of claim 25, wherein the size of the current block comprises a width of the current block and a height of the current block, and wherein the size of the current block is included in a size group.
28. The device of claim 25, wherein the one or more processors are further configured to: decoding MTS index values representing the MTS schemes in the set of available MTS schemes, and wherein the one or more processors are configured to: the MTS scheme is determined using the MTS index value.
29. The apparatus of claim 25, wherein the current block comprises a first block, wherein the MTS scheme comprises a transform pair comprising a horizontal transform and a vertical transform, wherein the first block has a size of WxH, wherein W is not equal to H, wherein the intra-prediction mode comprises I 1 And is an angular intra prediction mode, and wherein the one or more processors are further configured to:
determining that the second block has a size of HxW;
determining that the second block has (68-I) 1 ) Intra prediction mode of (a);
according to the size sum (68-I) of HxW of the second block 1 ) To determine the intra prediction mode for the frameThe set of available MTS schemes for the second block;
determining the MTS scheme for the second block;
applying the horizontal transform of the MTS scheme as a vertical transform to the second block; and
the vertical transform of the MTS scheme is applied to the second block as a horizontal transform.
30. The device of claim 25, wherein the current block comprises a first block, wherein the MTS scheme comprises a transform pair comprising a horizontal transform and a vertical transform, wherein the first block has a size of WxH, wherein W is not equal to H, wherein the intra-prediction mode comprises a matrix intra-prediction (MIP) mode having a first transition flag value, and wherein the one or more processors are further configured to:
Determining that the second block has a size of HxW;
determining that an intra-prediction mode for the second block is a MIP intra-prediction mode having a second transpose flag value different from the first transpose flag value;
determining the set of available MTS schemes for the second block according to a size of an HxW of the second block and the MIP intra-prediction mode having the second transpose flag value;
determining the MTS scheme for the second block from the set of available MTS schemes;
applying the horizontal transform of the MTS scheme as a vertical transform to the second block; and
the vertical transform of the MTS scheme is applied to the second block as a horizontal transform.
31. The device of claim 25, wherein the one or more processors are further configured to: the current block is encoded prior to decoding the current block.
32. The apparatus of claim 25, further comprising: a display configured to display the decoded video data.
33. The device of claim 25, wherein the device comprises one or more of a camera, a computer, a mobile device, a broadcast receiver device, or a set-top box.
34. An apparatus for decoding video data, the apparatus comprising:
means for determining a size of a current block of video data;
means for determining an intra prediction mode for the current block of video data;
means for determining a mode group comprising the determined intra prediction modes, the mode group being one of a plurality of mode groups, each of the mode groups comprising a respective set of intra prediction modes, such that each possible intra prediction mode is not included in more than one of the mode groups;
means for determining a set of available Multiple Transform Selection (MTS) schemes for the current block based on the size of the current block and the intra-prediction mode, the set of available MTS schemes being one of a plurality of sets of MTS schemes;
means for determining an MTS scheme from the set of available MTS schemes based on the determined pattern groups;
means for applying a transform of the MTS scheme to a transform block of the current block to generate a residual block of the current block; and
and means for decoding the current block using the residual block.
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US63/173,884 | 2021-04-12 | ||
US63/223,377 | 2021-07-19 | ||
US17/658,803 US20220329800A1 (en) | 2021-04-12 | 2022-04-11 | Intra-mode dependent multiple transform selection for video coding |
US17/658,803 | 2022-04-11 | ||
PCT/US2022/071669 WO2022221829A1 (en) | 2021-04-12 | 2022-04-12 | Intra-mode dependent multiple transform selection for video coding |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117223283A true CN117223283A (en) | 2023-12-12 |
Family
ID=89044851
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202280026724.6A Pending CN117223283A (en) | 2021-04-12 | 2022-04-12 | Intra-mode intra-phase Guan Duochong transform selection for video coding |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117223283A (en) |
-
2022
- 2022-04-12 CN CN202280026724.6A patent/CN117223283A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TW202218422A (en) | Multiple neural network models for filtering during video coding | |
CN113170162B (en) | Shared candidate list and parallel candidate list derivation for video coding | |
TW202213996A (en) | Filtering process for video coding | |
CN114868387B (en) | Chroma transform skipping and joint chroma coding enablement for blocks in video coding | |
KR20230038709A (en) | Multiple adaptive loop filter sets | |
TW202308377A (en) | Signaled adaptive loop filter with multiple classifiers in video coding | |
CN115997381A (en) | Model parameter derivation for local illumination compensation in luma map domains with chroma scaling in video coding | |
CN116325729A (en) | Activation function design in neural network-based filtering process for video coding | |
TW202205865A (en) | Deblocking filter parameter signaling | |
CN113615178B (en) | Chroma intra prediction in video coding | |
CN114731403A (en) | Residual codec selection and low-level signaling based on quantization parameters | |
TW202315406A (en) | Candidate lists of multiple reference lines for video coding | |
CA3210355A1 (en) | Intra-mode dependent multiple transform selection for video coding | |
TW202306388A (en) | Motion vector candidate construction for geometric partitioning mode in video coding | |
TW202232954A (en) | Most probable modes for intra prediction for video coding | |
CN117223283A (en) | Intra-mode intra-phase Guan Duochong transform selection for video coding | |
TW202433932A (en) | Vector difference candidate list construction | |
TW202433933A (en) | Intra-block copy for natural video content | |
CN118830242A (en) | Overlapped Block Motion Compensation (OBMC) hybrid selection in video coding | |
CN117426097A (en) | Joint truncation operation of filters for video coding | |
CN117561712A (en) | Signaling adaptive loop filter with multiple classifiers in video coding | |
TW202404371A (en) | Neural network based filtering process for multiple color components in video coding | |
TW202431835A (en) | Adaptive loop filter classifiers | |
TW202241131A (en) | Context modeling for sign prediction for video coding | |
CN117461313A (en) | Motion vector candidate construction for geometric partitioning modes in video coding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 40096320 Country of ref document: HK |