WO2020139182A1 - Method and apparatus for selecting transform selection in an encoder and decoder - Google Patents

Method and apparatus for selecting transform selection in an encoder and decoder Download PDF

Info

Publication number
WO2020139182A1
WO2020139182A1 PCT/SE2019/051206 SE2019051206W WO2020139182A1 WO 2020139182 A1 WO2020139182 A1 WO 2020139182A1 SE 2019051206 W SE2019051206 W SE 2019051206W WO 2020139182 A1 WO2020139182 A1 WO 2020139182A1
Authority
WO
WIPO (PCT)
Prior art keywords
transform
video block
flag
vertical direction
horizontal direction
Prior art date
Application number
PCT/SE2019/051206
Other languages
French (fr)
Inventor
Christopher Hollmann
Jacob STRÖM
Per Wennersten
Davood SAFFAR
Original Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget Lm Ericsson (Publ) filed Critical Telefonaktiebolaget Lm Ericsson (Publ)
Priority to US16/640,010 priority Critical patent/US11082692B2/en
Priority to JP2021537996A priority patent/JP7257523B2/en
Priority to KR1020217023848A priority patent/KR20210104895A/en
Priority to CN201980086072.3A priority patent/CN113302923B/en
Priority to MX2021007633A priority patent/MX2021007633A/en
Priority to EP19902876.2A priority patent/EP3903487A4/en
Priority to RU2021122027A priority patent/RU2767513C1/en
Publication of WO2020139182A1 publication Critical patent/WO2020139182A1/en
Priority to US17/360,088 priority patent/US11558613B2/en
Priority to CONC2021/0009769A priority patent/CO2021009769A2/en
Priority to US18/077,414 priority patent/US11991359B2/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/12Selection from among a plurality of transforms or standards, e.g. selection between discrete cosine transform [DCT] and sub-band transform or selection between H.263 and H.264
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/625Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using discrete cosine transform [DCT]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Definitions

  • the application relates to methods and apparatuses for transform selection in encoding and decoding.
  • WC Versatile Video Coding
  • Transform Selection This tool allows an encoder to choose between three different transforms. These transforms consist of two variants of a Discrete Cosine Transformation (DCT) and one variant of a Discrete Sine Transformation (DST). During encoding, a transform is typically performed in the horizontal direction of the block, followed by a second transform in the vertical direction. These two transforms are independent of each other, so it is very much possible to use different transforms in different directions.
  • the set of transforms that can be selected from includes DCT- 2, DST-7 and DCT-8 [2]
  • the encoder tests all allowed combinations when selecting a transform to use. These are dependent on block type (inter/intra), block size, channel type and prediction mode. For example, for intra blocks in the luma channel with sizes between 4x4 samples and 32x32 samples, five different combinations are tested:
  • Blocks that are larger or in the chroma channel use only the DCT-2 in both directions.
  • the tool can be enabled separately for intra and inter prediction.
  • CTC Common Test Conditions
  • the tool is only enabled for intra predicted blocks.
  • the encoder uses the DCT-2 in both directions.
  • an arithmetic coder with adaptive probabilities may be used (Context- Adaptive Binary Arithmetic Coding, CABAC).
  • CABAC Context- Adaptive Binary Arithmetic Coding
  • the coder uses different contexts, each indicating a separate probability, to encode bins in the most efficient way.
  • the combination chosen by the encoder is signaled as follows:
  • the flag is only signaled for luma blocks with sizes between 4x4
  • the decoder a corresponding process is carried out. First, the emt_cu_flag is parsed. If the flag is set, the emt_tu_idx is parsed to determine the transform to be used.
  • EMT Exlicit Multiple-core Transform
  • AMT Adaptive Multicore Transform
  • a first aspect of the embodiments defines a method performed by a decoder.
  • the method comprises receiving an encoded video block having at least one flag encoded using context-based adaptive arithmetic coding.
  • the method comprises parsing at least one flag to determine if the at least one flag is set to signal that a first transform of a plurality of transforms is to be used to decode the encoded video block in both a horizontal direction and a vertical direction.
  • the method further comprises decoding the encoded video block is decoded in the horizontal direction and the vertical direction using the first transform to generate a decoded video block. Responsive to the at least one flag being set to signal that the first transform is not to be used in both the horizontal direction and in the vertical direction, the method comprises parsing a second flag of the at least one flag to determine if the second flag is set to signal a second transform of the plurality of transforms is to be used to decode the encoded video block in both a horizontal direction and a vertical direction to generate the decoded video block.
  • the method comprises decoding the encoded video block in the horizontal direction and the vertical direction using the second transform to generate the decoded video block. Responsive to the second flag being set to signal that the second transform is not to be used in both the horizontal direction and in the vertical direction, the method comprises parsing a third flag of the at least one flag to determine in which of the horizontal direction or the vertical direction the second transform is to be used to decode the encoded video block and in which of the horizontal direction or the vertical direction a third transform is to be used to decode the encoded video block. The method comprises decoding the encoded video block using the second and third transforms to generate the decoded video block.
  • a second aspect of the embodiments defines a decoder
  • the memory comprises instructions executable by the processor, which cause the processor to perform receiving an encoded video block having at least one flag encoded using context-based adaptive arithmetic coding.
  • the memory comprises instructions executable by the processor, which cause the processor to perform parsing the at least one flag to determine if the at least one flag is set to signal that a first transform of a plurality of transforms is to be used to decode the encoded video block in both a horizontal direction and a vertical direction.
  • the memory comprises instructions executable by the processor, which cause the processor to perform, responsive to the at least one flag being set to signal that the first transform is to be used in both the horizontal direction and in the vertical direction, decoding the encoded video block in the horizontal direction and the vertical direction using the first transform to generate a decoded video block.
  • the memory comprises instructions executable by the processor, which cause the processor to perform, responsive to the at least one flag being set to signal that the first transform is to be used in both the horizontal direction and in the vertical direction, decoding the encoded video block in the horizontal direction and the vertical direction using the first transform to generate a decoded video block.
  • the memory comprises instructions executable by the processor, which cause the processor to perform, responsive to the at least one flag being set to signal that the first transform is not to be used in both the horizontal direction and in the vertical direction, parsing a second flag of the at least one flag to determine if the second flag is set to signal a second transform of the plurality of transforms is to be used to decode the encoded video block in both a horizontal direction and a vertical direction to generate the decoded video block.
  • the memory comprises instructions executable by the processor, which cause the processor to perform, responsive to the at least one flag being set to signal that the first transform is to be used in both the horizontal direction and in the vertical direction, decoding the encoded video block in the horizontal direction and the vertical direction using the second transform to generate the decoded video block.
  • the memory comprises instructions executable by the processor, which cause the processor to perform, responsive to the at least one flag being set to signal that the first transform is not to be used in both the horizontal direction and in the vertical direction, parsing a third flag of the at least one flag to determine in which of the horizontal direction or the vertical direction the second transform is to be used to decode the encoded video block and in which of the horizontal direction or the vertical direction a third transform is to be used to decode the encoded video block.
  • the memory comprises instructions executable by the processor, which cause the processor to perform decoding the encoded video block using the second and third transforms to generate the decoded video block.
  • a third aspect of the embodiments defines a computer program for a decoder.
  • the computer program comprises code means which, when run on a computer, causes the computer to receive an encoded video block having at least one flag encoded using context-based adaptive arithmetic coding.
  • the computer program comprises code means which, when run on a computer, causes the computer to parse at least one flag to determine if the at least one flag is set to signal that a first transform of a plurality of transforms is to be used to decode the encoded video block in both a horizontal direction and a vertical direction.
  • the computer program comprises code means which, when run on a computer, causes the computer to, responsive to the at least one flag being set to signal that the first transform is to be used in both the horizontal direction and in the vertical direction, decode the encoded video block is decoded in the horizontal direction and the vertical direction using the first transform to generate a decoded video block.
  • the computer program comprises code means which, when run on a computer, causes the computer to, responsive to the at least one flag being set to signal that the first transform is not to be used in both the horizontal direction and in the vertical direction, parse a second flag of the at least one flag to determine if the second flag is set to signal a second transform of the plurality of transforms is to be used to decode the encoded video block in both a horizontal direction and a vertical direction to generate the decoded video block.
  • the computer program comprises code means which, when run on a computer, causes the computer to, responsive to the second flag being set to signal that the second transform is to be used in both the horizontal direction and in the vertical direction, decode the encoded video block in the horizontal direction and the vertical direction using the second transform to generate the decoded video block.
  • the computer program comprises code means which, when run on a computer, causes the computer to, responsive to the second flag being set to signal that the second transform is not to be used in both the horizontal direction and in the vertical direction, parse a third flag of the at least one flag to determine in which of the horizontal direction or the vertical direction the second transform is to be used to decode the encoded video block and in which of the horizontal direction or the vertical direction a third transform is to be used to decode the encoded video block.
  • the computer program comprises code means which, when run on a computer, causes the computer to decode the encoded video block using the second and third transforms to generate the decoded video block.
  • a fourth aspect of the embodiments defines a computer program product comprising computer readable means and a computer program according to the third aspect, stored on the computer readable means.
  • a fifth aspect of the embodiments defines a method performed by an encoder.
  • the method comprises receiving a video block for encoding.
  • the method comprises determining a characteristic of the video block.
  • the method further comprises, responsive to the characteristic being of a type that indicates a multiple transform selection is used, selecting a first transform in a plurality of transforms that is part of the multiple transform selection and that is one of most computationally expensive to use or least likely to be used in encoding the video block.
  • the method comprises testing combinations of the plurality of transforms in a horizontal direction and a vertical direction without testing a combination where the first transform is used in both the horizontal direction and the vertical direction.
  • the method comprises selecting a combination from the combinations that provides the lowest rate distortion.
  • the method comprises encoding the video block using the selected combination to generate an encoded video block.
  • the method comprises, responsive to the characteristic being of a type that indicates a multiple transform selection is not to be used, encoding the video block using a default transform in the horizontal direction and the vertical direction.
  • a sixth aspect of the embodiments defines an encoder for encoding a block of video based on a block size of the block, wherein each of a horizontal direction and a vertical direction of the block is encoded using a transform, wherein the transform can be one of a first transform, a second transform or a third
  • the encoder comprises at least one processor and a memory coupled to the processor.
  • the memory comprises instructions executable by the processor, which cause the processor to perform determining a characteristic of the video block.
  • the memory comprises instructions executable by the processor, which cause the processor to perform, responsive to the characteristic being of a type that indicates a multiple transform selection is used, selecting a first transform in a plurality of transforms that is part of the multiple transform selection and that is one of most computationally expensive to use or least likely to be used in encoding the video block.
  • the memory comprises instructions executable by the processor, which cause the processor to perform testing combinations of the plurality of transforms in a horizontal direction and a vertical direction without testing a combination where the first transform is used in both the horizontal direction and the vertical direction.
  • the memory comprises instructions executable by the processor, which cause the processor to perform selecting a combination from the combinations that provides the lowest rate distortion.
  • the memory comprises instructions executable by the processor, which cause the processor to perform encoding the video block using the selected combination to generate an encoded video block.
  • the memory comprises instructions executable by the processor, which cause the processor to perform, responsive to the characteristic being of a type that indicates a multiple transform selection is not to be used, encoding the video block using a default transform in the horizontal direction and the vertical direction.
  • a seventh aspect of the embodiments defines a computer program for encoding a block of video based on a block size of the block, wherein each of a horizontal direction and a vertical direction of the block is encoded using a transform, wherein the transform can be one of a first transform, a second transform or a third transform.
  • the computer program comprises code means which, when run on a computer, causes the computer to determine a characteristic of the video block.
  • the computer program comprises code means which, when run on a computer, causes the computer to, responsive to the characteristic being of a type that indicates a multiple transform selection is used, select a first transform in a plurality of transforms that is part of the multiple transform selection and that is one of most computationally expensive to use or least likely to be used in encoding the video block.
  • the computer program comprises code means which, when run on a computer, causes the computer to test combinations of the plurality of transforms in a horizontal direction and a vertical direction without testing a combination where the first transform is used in both the horizontal direction and the vertical direction.
  • the computer program comprises code means which, when run on a computer, causes the computer to select a combination from the combinations that provides the lowest rate distortion.
  • the computer program comprises code means which, when run on a computer, causes the computer to encoding the video block using the selected combination to generate an encoded video block.
  • the computer program comprises code means which, when run on a computer, causes the computer to, responsive to the characteristic being of a type that indicates a multiple transform selection is not to be used, encode the video block using a default transform in the horizontal direction and the vertical direction.
  • An eighth aspect of the embodiments defines a computer program product comprising computer readable means and a computer program according to the seventh aspect, stored on the computer readable means.
  • the advantages provided by the inventive concepts include reducing the encoder complexity by removing one of the five combinations described above. Both encoder and decoder complexity are reduced by using a less complex transform for certain block sizes. Furthermore, the efficiency of the binarization is increased as the number of bins for the most common combination (DST-7 in both directions) is reduced from 3 to 2.
  • Figure 1 is a block diagram illustrating an example of an
  • Figure 2 is a block diagram is a block diagram illustrating an encoder according to some embodiments.
  • Figure 3 is a block diagram illustrating a decoder according to some embodiments.
  • FIG. 4 is a block diagram illustrating components of a MTS tool
  • FIG. 5 is a block diagram illustrating components of a MTS tool according to some embodiments.
  • Figure 6 is a block diagram illustrating components of a MTS tool according to some embodiments.
  • Figures 7-11 are flow charts illustrating operations of an encoder and/or decoder in accordance with some embodiments of inventive concepts
  • FIG. 1 illustrates an example of an operating environment of an encoder 100 that may be used to encode bitstreams as described herein.
  • the encoder 100 has a multiple transform selection (MTS) component 102 used in encoding.
  • the encoder 100 receives video from network 104 and/or from storage 106 and encodes the video into bitstreams using MTS component 102 for defined block sizes of the video as described below and transmits the encoded video to decoder 108 via network 110.
  • Storage device 106 may be part of a storage depository of videos such as a storage repository of a store or a streaming video service, a separate storage component, a component of a mobile device, etc.
  • the decoder 108 may be part of a device 112 having an audio/video (A/V) media player 114.
  • the device 112 may be a mobile device, a set-top device, a desktop computer, and the like.
  • FIG. 2 is a block diagram illustrating elements of encoder 100 configured to encode video frames according to some embodiments of inventive concepts.
  • encoder 100 may include a network interface circuit 205 (also referred to as a network interface) configured to provide communications with other devices/entities/functions/etc.
  • the encoder 100 may also include a processor circuit 201 (also referred to as a processor) coupled to the network interface circuit 205, and a memory circuit 203 (also referred to as memory) coupled to the processor circuit.
  • the memory circuit 203 may include computer readable program code that when executed by the processor circuit 201 causes the processor circuit to perform operations according to embodiments disclosed herein.
  • processor circuit 201 may be defined to include memory so that a separate memory circuit is not required.
  • operations of the encoder 100 may be performed by processor 201 and/or network interface 205.
  • processor 201 may control network interface 205 to transmit communications to decoder 108 and/or to receive communications through network interface 104 from one or more other network nodes/entities/servers such as other encoder nodes, depository servers, etc.
  • modules may be stored in memory 203, and these modules may provide instructions so that when instructions of a module are executed by processor 201 , processor 201 performs respective operations.
  • FIG. 3 is a block diagram illustrating elements of decoder 108 configured to decode video frames according to some embodiments of inventive concepts.
  • decoder 108 may include a network interface circuit 305 (also referred to as a network interface) configured to provide communications with other devices/entities/functions/etc.
  • the decoder 108 may also include a processor circuit 301 (also referred to as a processor) coupled to the network interface circuit 305, and a memory circuit 303 (also referred to as memory) coupled to the processor circuit.
  • the memory circuit 303 may include computer readable program code that when executed by the processor circuit 301 causes the processor circuit to perform operations according to embodiments disclosed herein.
  • processor circuit 301 may be defined to include memory so that a separate memory circuit is not required.
  • operations of the decoder 108 may be performed by processor 301 and/or network interface 305.
  • processor 301 may control network interface 305 to receive communications from encoder 100.
  • modules may be stored in memory 303, and these modules may provide instructions so that when instructions of a module are executed by processor 301 , processor 301 performs respective operations.
  • a potential advantage provided by the inventive concepts described herein include reducing the encoder run time by limiting the number of transform combinations to be evaluated in the case of an encoder implemented in software.
  • the complexity reduction may take another form, such as lowered silicon area usage instead of encoder run time.
  • the embodiments described herein reduce the complexity of both the encoder and decoder by replacing a transform that is computationally expensive to use or that is infrequently used by another transform for certain block sizes.
  • the DCT-8 which is relatively speaking computationally expensive
  • the DCT-2 which is relatively speaking less computationally expensive
  • the compression efficiency is increased by using CABAC contexts to binarize emt_cu_flag and emt_tu_idx.
  • a further improvement is a reduction in memory usage as no transform coefficients for the transform replaced (e.g., size-32 DCT-8) have to be stored in the memory. In a hardware implementation this may translate to a smaller silicon surface area.
  • the compression efficiency (average BD-rate for luma) is improved by 0.07% in the All Intra configuration and 0.02% in the Random Access (RA) configuration.
  • the encoding time is reduced to 85% (Al) and 95% (RA), respectively, compared to the anchor.
  • One reason for this is due to the computationally expensive combination of DCT-8 horizontally and DCT-8 vertically being removed from use.
  • the improvements in compression efficiency are 0.03% (Al) and 0.01 % (RA), while the encoder run time is reduced to 88% (Al) and 98% (RA), respectively.
  • Figure 4 illustrates an embodiment of how a MTS tool is presently implemented.
  • Figure 5 illustrates how the MTS of Figure 4 is changed in one embodiment.
  • each node is marked with a letter followed by a colon sign (i.e. , "a:” to “j:”).
  • each node is marked with two letters followed by a colon sign (i.e., "aa:” to "hh:”).
  • each node is marked with three letters followed by a colon sign (i.e., "aaa:” to "jjj:”)
  • the inventors realized that several different changes to the MTS tool currently implemented in the draft of the WC standard may be made to increase computational efficiency of the encoder and decoder.
  • the decoder acknowledges this change by applying the DCT-2 instead of the DCT-8 in cases where the block is of a specific size and either the mts_tu_idx_hor or mts_tu_idx_ver indicates the use of the DCT-8. Due to this change, the text below refers to the DCT-X, which means DCT-8 for some block sizes and DCT-2 for other block sizes.
  • Change 3 The combination of DST-7 horizontally and DST-7 vertically (node j), which is the most common combination of transforms, is moved in the coding tree in Figure 5 to the position currently occupied by the DCT-X horizontally (node ee). Due to change 1 , the mts_tu_idx_ver does not need to be encoded if the mts_tu_idx_hor indicates the DCT-X. This change takes advantage of this omission.
  • the mts_tu_flag indicates in which direction DCT-X and DST-7 are to be used.
  • the flag also signals whether to
  • the block size or block shape for instance the block size or block shape.
  • the context is selected based on
  • the block size or block shape different information, for example, the block size or block shape.
  • a flag (e.g., mst_same_flag) indicating whether both transforms
  • Change 1 to change 6 are reflected in Figure 5 and change 7 is reflected in Figure 6.
  • various embodiments shall be described indicating which changes are made for specific block sizes and for specific types of blocks (i.e. , inter coded blocks or intra coded blocks).
  • the encoder 100 in operation 701 determines a characteristic of the video block to be encoded.
  • the characteristic may be block size, block type (inter/intra), channel type, prediction mode, dimension (width or height) of the block as well as the direction of the intra prediction, etc.
  • the encoder 100 in operation 703 selects a first transform from a plurality of transforms used by the multiple selection transform component (MST) and that is either the most computationally expensive or least likely to be used in encoding the video block.
  • MST multiple selection transform component
  • the DCT-8 transform may be selected and designated as the first transform.
  • the encoder 100 tests combinations of transforms without testing a combination where the first transform is used both in the horizontal direction and in the vertical direction.
  • the DCT-8 transform in the scenario described in operation 703 would not be tested in both the horizontal direction and the vertical direction.
  • a combination is selected that provides the lowest rate distortion in comparison to other test combinations.
  • Other decision factors may also be used in selecting the combination to use. For example, of one of the transforms is preferred over another transform and both transforms have
  • the preferred transform may be used.
  • the video block is encoded using the combination selected to generate an encoded block.
  • the encoded block is transmitted to a decoder, such as decoder 108, with flags that are used by the decoder to determine which combination was used in encoding and is to be used in decoding the encoded block.
  • the video block is encoded using a default transformation is both horizontal and vertical directions.
  • the DCT-2 transform may be used as the default transform.
  • the encoded block is transmitted to the decoder, such as decoder 108, with flags that are used by the decoder to determine which combination was used in encoding and is to be used in decoding the encoded block.
  • the decoder 108 may perform operations.
  • the decoder receives an encoded video block that has flags.
  • a first flag is parsed to determine if the flag is set.
  • the first flag may be the mts_cu_flag.
  • the flag setting may indicate whether a first transform is to be used to decode the encoded video block in both the horizontal direction and the vertical direction.
  • the setting may be a binary setting of a 1 or a 0.
  • the first flag is equal to a first value or a second value
  • a setting of 1 may indicate the first transform is to be used in both directions.
  • a setting of 0 may be used to indicate the first transform is to be used in both directions.
  • the video block is decoded using the first transform in both the horizontal direction and the vertical direction responsive to the first flag have a value associated with the first transform being used in both directions (e.g., the first flag is equal to a first value).
  • the DCT-2 transform may be used in both the horizontal direction and the vertical direction to decode the video block.
  • a second flag is parsed responsive to the first flag setting having a value associated with the first transform not being used in both directions.
  • the second flag is parsed to determine the second flag setting.
  • the flag setting may indicate whether a second transform is to be used to decode the encoded video block in both the horizontal direction and the vertical direction.
  • the setting may be a binary setting of a 1 or a 0.
  • the second flag is equal to a first value or a second value.
  • a setting of 1 may indicate the second transform is to be used in both directions.
  • a setting of 0 may be used to indicate the second transform is to be used in both directions.
  • the second transform may be one of two transforms.
  • the second flag may be parsed to determine which of the two transforms to be sued to decode the video block.
  • the two transforms in one embodiment may be the DST-7 transform and the DCT-8 transform
  • the video block is decoded using the second transform in both the horizontal direction and the vertical direction responsive to the second flag have a value associated with the second transform being used in both directions (e.g., the second flag is equal to a first value).
  • the DST-7 transform may be used in both the horizontal direction and the vertical direction to decode the video block in operation 809.
  • a third flag is parsed responsive to the second flag setting having a value associated with the second transform not being used in both directions.
  • the third flag is parsed to determine the third flag setting.
  • the third flag setting may indicate whether a second transform is to be used to decode the encoded video block in the horizontal direction or the vertical direction and a third transform to be used to decode in the other of the horizontal direction and vertical direction.
  • This may be a first preferred transform combination.
  • the setting may be a binary setting of a 1 or a 0.
  • a setting of 1 may indicate the second transform is to be used in the horizontal direction and the third transform is to be used in the vertical direction.
  • a setting of 0 may be used to indicate the second transform is to be used in the horizontal direction and the third transform to be used in the vertical direction.
  • This may be a second preferred transform combination.
  • the third transform in an embodiment may be the first transform.
  • the video block is decoded using the second transform in either the horizontal direction or the vertical direction based on the setting of the third flag.
  • the DST-7 transform may be used in the horizontal direction and either the DCT-2 or DCT-8 transform used in the vertical direction to decode the video block in operation 813.
  • the DST-7 transform may be used in the vertical direction and either the DCT-2 or DCT-8 transform used in the horizontal direction to decode the video block in operation 813.
  • the decoder may output the decoded video block to a media player for playback of the decoded video block.
  • the decoder 108 in operation 901 determines whether a first criterion is met based on the block size of the encoded video block.
  • the criterion may be block size, block type
  • the decoder selects the transform combination from one: of the first transform in both the vertical direction and the horizontal direction; the third transform in both the vertical direction and the horizontal direction; the first transform in the vertical direction and the third transform in the horizontal direction; and the third transform in the vertical direction and the first transform in the horizontal direction.
  • the decoder selects the transform combination from one of: the first transform in both the vertical direction and the horizontal direction; the third transform in both the vertical direction and the horizontal direction; the second transform in the vertical direction and the third transform in the horizontal direction; and the third transform in the vertical direction and the second transform in the horizontal direction.
  • the decoder decodes the block using the selected combination.
  • the decoder may transmit the encoded block towards a media player.
  • the first transform in the embodiments described below is the DCT- 2 transform
  • the second transform is the DCT-8 transform
  • the third transform is the DST-7 transform.
  • the first criterion is block size.
  • change 1 is done for all block sizes where the MTS tool is allowed, and change 2 is done for all blocks where at least one dimension has a length of 32 samples.
  • all blocks of size 16x16 or smaller evaluate the following combinations:
  • the decoder can determine the correct combination of transforms based on the parsed flags and the block size. If the block is of size 16x16 or smaller, the decoded bins can indicate the following combinations:
  • the block is of size 32xN or Nx32 in the first embodiment, where N can be 4, 8, 16 or 32 (i.e. , the first criterion of Figure 9 is met when the encoded block has a size of form 32xN or Nx32 where N can assume the values 4, 8, 16, or 32), the following combinations can be indicated:
  • Table 1 shows where DCT-2 and DCT-8 are used in the first embodiment:
  • change 1 is done for all block sizes where the MTS tool is allowed, and change 2 is done for all blocks of size 16x32, 32x16 or 32x32.
  • all blocks of size 16x16 or smaller, 4x32, 8x32, 32x4 and 32x8 evaluate the following combinations:
  • the decoder is able to determine the correct combination of transforms based on the parsed flags and the block size. If the block is of size 16x16 or smaller, 4x32, 8x32, 32x4 or 32x8 the decoded bins can indicate the following combinations: - DCT-2 horizontally and DCT-2 vertically
  • the block is of size 32x16, 16x32 or 32x32 in the second embodiment (i.e. , the first criterion of Figure 9 is met when the encoded block has a size of form 32x32 or 32x16 or 16x32), the following combinations can be indicated:
  • Table 2 shows where DCT-2 and DCT-8 are used in the second embodiment:
  • changes 1 , 3 and 4 are done for all block sizes. If a step to the right in Figure 5 is encoded as a , and a step to the left is encoded as a O’, the combinations would be encoded as follows:
  • the decoder will parse the flags and determine the combination of transforms based on the decoded bins.
  • the mts_cu_flag may be the first flag
  • the mts_dst_flag may be the second flag
  • the mts_tu_flag may be the third flag.
  • changes 1 , 3, 4 and 5 are done for all block sizes.
  • the more preferred combination as described in change 5 can be marked by setting the mts_tu_flag to and the less preferred combination as described in change 5 can be marked by setting the mts_tu_flag to O’.
  • Figure 10 illustrates this embodiment.
  • table 4 may be used by the decoder to determine the transform combination to use.
  • the decoder parses the first flag to determine if the first flag is equal to a first value or a second value.
  • the first transform is selected to decode the encoded block in both the vertical direction and the horizontal direction.
  • a second flag is parsed to determine whether the second flag is equal to the first value or the second value.
  • the third transform is selected to decode the encoded block in both the vertical direction and the horizontal direction.
  • a third flag is parsed to determine whether the third flag is equal to the first value or the second value.
  • a more preferred transform combination is selected to decode the encoded block.
  • a less preferred transform combination is selected to decode the encoded block.
  • the combination of using DST-7 horizontally and DCT-X vertically is regarded more preferred if the intra direction is closer to horizontal than to vertical.
  • the combination of using DCT-X horizontally and DST-7 vertically is regarded as more preferred.
  • the decoder will determine the combination based on the intra direction of the block. [0074] If the intra direction is, for example, purely horizontal, and the decoder reads the mts_tu_flag as , it will use a transform combination of DST-7 horizontally and DCT-X vertically. If the flag is read as O’, the decoder will use a transform combination of DCT-X horizontally and the DST-7 vertically.
  • the decoder If the intra direction is, for example, purely vertical, and the decoder reads the mts_tu_flag as‘T, it will use a transform combination of DCT-X horizontally and the DST-7 vertically. If the flag is read as O’, the decoder will use a transform combination of DST-7 horizontally and DCT-X vertically.
  • a sixth embodiment that is one of the set of embodiments, if the block is using inter prediction, the combination of using DST-7 horizontally and DCT-X vertically is regarded as more probable if the block has a larger width than height. If the block has a larger height than width the combination of using DCT-X horizontally and DST-7 vertically is regarded as more probable.
  • the decoder If the block has, for example, a size of 16x4 samples, and the decoder reads the mts_tu_flag as‘T, it will use a transform combination of DST-7 horizontally and DCT-X vertically. If the flag is read as O’, the decoder will use a transform combination of DCT-X horizontally and the DST-7 vertically.
  • the decoder If the block has, for example, a size of 4x16 samples, and the decoder reads the mts_tu_flag as‘T, it will use a transform combination of DCT-X horizontally and DST-7 vertically. If the flag is read as O’, the decoder will use a transform combination of DST-7 horizontally and the DCT-X vertically.
  • a 45-degree prediction direction is equally close to vertical as to horizontal. Therefore, the decoder and encoder have to agree on a tie-breaking rule to treat 45-degree directions in the same manner. In the set of embodiments above, this is handled by treating 45-degree directions as more vertical than horizontal. In a different embodiment, it may be advantageous to use a different tie-breaking rule, such as treating 45-degree directions as horizontal. Another possibility is to change at another degree than 45-degree directions. As an example, it may be advantageous to treat not only 45-degree directions as vertical, but also treat, for example, 43-degree directions as vertical, although mathematically they are closer to a horizontal direction. In general, it is therefore possible to use any angle in the tie-break rule, not just diagonal directions.
  • non- directional intra prediction modes (planar or DC).
  • these predictions are treated as more horizontal than vertical.
  • the intra modes 0-34 are treated as being closer to horizontal and the intra modes 35-66 are treated as being closer to vertical.
  • change 6 is used for intra coded blocks.
  • the selection of which context to use for encoding and decoding the mts_cu_flag is made based on the longer side of the block and the intra direction.
  • the intra directions are divided into two groups, one where using the DCT-2 horizontally and vertically is more preferred and one where using the DCT-2 horizontally and vertically is less preferred. These groups can be identical for different block sizes. Using the DCT-2 both horizontally and vertically can for example be more preferred if the intra mode is close to horizontal or vertical. In the same example, the combination would be less preferred if the intra direction is close to diagonal.
  • the decoder determines if the block is of size 32xN or Nx32 where N can be 4, 8, 16 or 32.
  • N can be 4, 8, 16 or 32.
  • the decoder determines if the block is of size 32xN or Nx32 where N can be 4, 8, 16 or 32.
  • the intra direction is close to horizontal or close to vertical (i.e. , it passes one of a horizontal closeness test or a vertical closeness test as determined in operation 1103), for instance if it is purely horizontal, one context will be chosen, for example with a first identifier (id) 0 in operation 1105.
  • N can be 4, 8, 16 or 32 and the intra direction is close to diagonal (i.e., it does not pass one of a horizontal closeness test or a vertical closeness test as determined in operation 1103), for instance if it is purely diagonal, a different context will be chosen, for example with a second id 1 in operation 1107.
  • the decoder determines if the block is of size 16xN or Nx16 where N can be 4, 8 or 16.
  • operation 111 1 responsive to the block being one of size 16xN or Nx16 and the intra direction is close to horizontal or close to vertical (i.e., it passes one of a horizontal closeness test or a vertical closeness test as determined in operation 1111 ), for instance if it is purely vertical, a different context will be chosen, for example with a third id 2 in operation 11 13.
  • Responsive to the block being is of size 16xN or Nx16 where N can be 4, 8 or 16 and the intra direction is close to diagonal i.e. , it does not pass one of a horizontal closeness test or a vertical closeness test as determined in operation 11 11 ), for instance if it is purely diagonal, a different context will be chosen, for example with a fourth id 3 in operation 11 15.
  • the decoder determines, if the block is of size 8x8, 8x4, 4x8 or 4x4.
  • operation 11 19 responsive to the block being one of size 8x8, 8x4, 4x8, or 4x4 and the intra direction is close to horizontal or close to vertical (i.e., it passes one of a horizontal closeness test or a vertical closeness test as determined in operation 1119), for instance if it is purely horizontal, a different context will be chosen, for example with a fifth id 4 in operation 1121.
  • a set of tie-breaking rules should be defined for the encoder and decoder for the cases where a prediction direction is equally close to horizontal and vertical. Tie-breaking rules should also be defined for the non-directional intra prediction modes Planar or DC.
  • the intra modes 10-22 may be seen as close to horizontal and may be treated as being horizontal
  • the intra modes 46-57 may be seen as close to vertical and may be treated as being vertical
  • the remaining intra modes 0-9, 23-45 and 58-66 may be seen as close to diagonal and be treated as being diagonal.
  • change 6 is used for inter coded blocks. The selection of which context to use for encoding and decoding the mts_cu_flag is made based on the block size and shape. For example, the six contexts can be selected as follows:
  • change 1 is done for all block sizes where the MTS tool is allowed, and change 2 is done for all blocks where at least one dimension has a length of 16 or 32 samples.
  • all blocks of size 8x8 or smaller evaluate the following combinations:
  • N can be 4, 8, 16 or 32, the following combinations are evaluated:
  • the decoder can determine the correct combination of transforms based on the parsed flags and the block size. If the block is of size 8x8 or smaller, the decoded bins can indicate the following combinations:
  • the block is of size 16xN, Nx16, 32xN or Nx32 in the ninth embodiment, where N can be 4, 8, 16 or 32 (i.e. , the first criterion of Figure 9 is met when the encoded block has a size of form 16xN, Nx16, 32xN or Nx32 where N can assume the values 4, 8, 16, or 32), the following combinations can be indicated:
  • change 1 is done for all block sizes where the MTS tool is allowed, and change 2 is done for all blocks where at least one dimension has a length of 32 samples or 4 samples.
  • all blocks of size 8x8, 8x16, 16x8 or 16x16 evaluate the following combinations:
  • N can be 4, 8, 16 or 32, the following combinations are evaluated: DCT-2 horizontally and DCT-2 vertically
  • the decoder can determine the correct combination of transforms based on the parsed flags and the block size. If the block is of size 8x8, 8x16, 16x8 or 16x16, the decoded bins can indicate the following combinations:
  • the block is of size 4xN, Nx4, 32xN or Nx32 in the tenth embodiment, where N can be 4, 8, 16 or 32 (i.e. , the first criterion of Figure 9 is met when the encoded block has a size of form 4xN, Nx4, 32xN or Nx32 where N can assume the values 4, 8, 16, or 32), the following combinations can be indicated:
  • Table 8 shows where DCT-2 and DCT-8 are used in the tenth embodiment:
  • change 1 is done for all block sizes where the MTS tool is allowed, and change 2 is done for all blocks where at least one dimension has a length of 32 samples or the block has a size of 4x4 samples.
  • all blocks of size 16x16 or smaller but larger than 4x4 evaluate the following combinations:
  • the decoder can determine the correct combination of transforms based on the parsed flags and the block size. If the block is of size 16x16 or smaller but larger than 4x4, the decoded bins can indicate the following combinations:
  • N can be 4, 8, 16 or 32 (i.e. , the first criterion of Figure 9 is met when the encoded block has a size of form 4x4, 32xN or Nx32 where N can assume the values 4, 8, 16, or 32), the following combinations can be indicated:
  • the decoder can determine the correct combination of transforms based on the parsed flags.
  • the decoded bins can indicate the following
  • change 7 is incorporated.
  • a new flag called mts_same_flag, is signaled to indicate whether a block use the same transform in both horizontal and vertical direction. In one embodiment, if the flag has the value , the block uses identical transforms in both directions, whereas if the flag has the value O’, two different transformations will be used.
  • the mts_same_flag indicates that a block uses the same transform in both horizontal and vertical direction.
  • An additional flag mts_tijjdx is signaled to indicate whether to use DCT-8 or DST-7 in both directions.
  • the mts_same_flag indicates that a block uses different transforms in horizontal and vertical direction.
  • An additional flag mts_tijjdx is signaled to indicate whether to use DCT-8 in the horizontal direction and DST-7 in the vertical direction, or DST-7 in the horizontal direction and DCT-8 in the vertical direction.
  • the mts_same_flag is parsed by the decoder, indicating that the same transform should be used in both horizontal and vertical direction.
  • the mts_tu_idx is parsed by the decoder, indicating whether to use DST-7 or DCT-8 in both directions.
  • the mts_same_flag is parsed by the decoder, indicating that two different transforms should be used for the current block.
  • the mts_tu_idx is parsed by the decoder to determine whether to use DCT-8 in horizontal and DST-7 in vertical direction, or DST-7 in horizontal and DCT-8 in vertical direction

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Discrete Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

There are provided mechanisms for methods and apparatuses for transform selection in encoding and decoding.

Description

METHOD AND APPARATUS FOR SELECTING
TRANSFORM SELECTION IN AN ENCODER AND DECODER
TECHNICAL FIELD
[0001 ] The application relates to methods and apparatuses for transform selection in encoding and decoding.
BACKGROUND
[0002] The current test model VTM of the video codec under
standardization Versatile Video Coding (WC) includes a tool called Multiple
Transform Selection (MTS). This tool allows an encoder to choose between three different transforms. These transforms consist of two variants of a Discrete Cosine Transformation (DCT) and one variant of a Discrete Sine Transformation (DST). During encoding, a transform is typically performed in the horizontal direction of the block, followed by a second transform in the vertical direction. These two transforms are independent of each other, so it is very much possible to use different transforms in different directions. The set of transforms that can be selected from includes DCT- 2, DST-7 and DCT-8 [2]
[0003] The encoder tests all allowed combinations when selecting a transform to use. These are dependent on block type (inter/intra), block size, channel type and prediction mode. For example, for intra blocks in the luma channel with sizes between 4x4 samples and 32x32 samples, five different combinations are tested:
1. DCT-2 horizontal and DCT-2 vertical
2. DST-7 horizontal and DST-7 vertical
3. DST-7 horizontal and DCT-8 vertical
4. DCT-8 horizontal and DST-7 vertical
5. DCT-8 horizontal and DCT-8 vertical
[0004] Blocks that are larger or in the chroma channel use only the DCT-2 in both directions. The tool can be enabled separately for intra and inter prediction. In the Common Test Conditions (CTC) [1 ] the tool is only enabled for intra predicted blocks. When the tool is disabled, the encoder uses the DCT-2 in both directions.
[0005] To reduce the number of bits needed to code the chosen
combination, an arithmetic coder with adaptive probabilities may be used (Context- Adaptive Binary Arithmetic Coding, CABAC). The coder uses different contexts, each indicating a separate probability, to encode bins in the most efficient way. In the bit stream the combination chosen by the encoder is signaled as follows:
• emt_cu_flag: 1 bin using 6 CABAC contexts to signal
whether DCT-2 is used both horizontally and vertically. The
context is chosen based on the split depth of the current block.
The flag is only signaled for luma blocks with sizes between 4x4
and 32x32 samples, and only if MTS is allowed for the current
prediction mode. If the value of the flag is 0, the DCT-2 is used,
otherwise the emt_tu_idx is used to determine the combination of transforms.
• emt_tu_idx: 2 bins using 4 CABAC contexts to signal which of the remaining four combinations is used. Two contexts are
used if the block is intra-coded, while the remaining 2 contexts
are used for inter-coded blocks. (In the CTC, these last two
contexts are not used since MTS is turned off for inter coding in
the CTC.) One context is used per bin, so the first bin uses
context 0 or 2 (depending on the prediction mode), whereas the
second bin always uses context 1 or 3. The possible values for
these two bins range from 0 (00, indicating DST-7 in both
directions) to 3 (1 1 , indicating DCT-8 in both directions). These
two bins are only signaled if the emt_cu_flag has the value 1.
[0006] In the decoder, a corresponding process is carried out. First, the emt_cu_flag is parsed. If the flag is set, the emt_tu_idx is parsed to determine the transform to be used.
[0007] Note that in some cases, the names EMT (Explicit Multiple-core Transform) or AMT (Adaptive Multicore Transform) are used for the transform tool. These were previous names of the tool, which has since been changed to MTS. As all names refer to the same tool, they may be used interchangeably.
[0008] The current draft of the WC standard performs an exhaustive search through all possible combinations. This results in the encoder spending a lot of time on testing different modes, some of which are very unlikely to be chosen. Furthermore, the binary coder uses an inefficient way to signal the transform index. SUMMARY
[0009] A first aspect of the embodiments defines a method performed by a decoder. The method comprises receiving an encoded video block having at least one flag encoded using context-based adaptive arithmetic coding. The method comprises parsing at least one flag to determine if the at least one flag is set to signal that a first transform of a plurality of transforms is to be used to decode the encoded video block in both a horizontal direction and a vertical direction.
Responsive to the at least one flag being set to signal that the first transform is to be used in both the horizontal direction and in the vertical direction, the method further comprises decoding the encoded video block is decoded in the horizontal direction and the vertical direction using the first transform to generate a decoded video block. Responsive to the at least one flag being set to signal that the first transform is not to be used in both the horizontal direction and in the vertical direction, the method comprises parsing a second flag of the at least one flag to determine if the second flag is set to signal a second transform of the plurality of transforms is to be used to decode the encoded video block in both a horizontal direction and a vertical direction to generate the decoded video block. Responsive to the second flag being set to signal that the second transform is to be used in both the horizontal direction and in the vertical direction, the method comprises decoding the encoded video block in the horizontal direction and the vertical direction using the second transform to generate the decoded video block. Responsive to the second flag being set to signal that the second transform is not to be used in both the horizontal direction and in the vertical direction, the method comprises parsing a third flag of the at least one flag to determine in which of the horizontal direction or the vertical direction the second transform is to be used to decode the encoded video block and in which of the horizontal direction or the vertical direction a third transform is to be used to decode the encoded video block. The method comprises decoding the encoded video block using the second and third transforms to generate the decoded video block.
[0010] A second aspect of the embodiments defines a decoder
comprising at least one processor and a memory coupled to the processor. The memory comprises instructions executable by the processor, which cause the processor to perform receiving an encoded video block having at least one flag encoded using context-based adaptive arithmetic coding. The memory comprises instructions executable by the processor, which cause the processor to perform parsing the at least one flag to determine if the at least one flag is set to signal that a first transform of a plurality of transforms is to be used to decode the encoded video block in both a horizontal direction and a vertical direction. The memory comprises instructions executable by the processor, which cause the processor to perform, responsive to the at least one flag being set to signal that the first transform is to be used in both the horizontal direction and in the vertical direction, decoding the encoded video block in the horizontal direction and the vertical direction using the first transform to generate a decoded video block. The memory comprises
instructions executable by the processor, which cause the processor to perform, responsive to the at least one flag being set to signal that the first transform is not to be used in both the horizontal direction and in the vertical direction, parsing a second flag of the at least one flag to determine if the second flag is set to signal a second transform of the plurality of transforms is to be used to decode the encoded video block in both a horizontal direction and a vertical direction to generate the decoded video block. The memory comprises instructions executable by the processor, which cause the processor to perform, responsive to the at least one flag being set to signal that the first transform is to be used in both the horizontal direction and in the vertical direction, decoding the encoded video block in the horizontal direction and the vertical direction using the second transform to generate the decoded video block. The memory comprises instructions executable by the processor, which cause the processor to perform, responsive to the at least one flag being set to signal that the first transform is not to be used in both the horizontal direction and in the vertical direction, parsing a third flag of the at least one flag to determine in which of the horizontal direction or the vertical direction the second transform is to be used to decode the encoded video block and in which of the horizontal direction or the vertical direction a third transform is to be used to decode the encoded video block. The memory comprises instructions executable by the processor, which cause the processor to perform decoding the encoded video block using the second and third transforms to generate the decoded video block.
[0011 ] A third aspect of the embodiments defines a computer program for a decoder. The computer program comprises code means which, when run on a computer, causes the computer to receive an encoded video block having at least one flag encoded using context-based adaptive arithmetic coding. The computer program comprises code means which, when run on a computer, causes the computer to parse at least one flag to determine if the at least one flag is set to signal that a first transform of a plurality of transforms is to be used to decode the encoded video block in both a horizontal direction and a vertical direction. The computer program comprises code means which, when run on a computer, causes the computer to, responsive to the at least one flag being set to signal that the first transform is to be used in both the horizontal direction and in the vertical direction, decode the encoded video block is decoded in the horizontal direction and the vertical direction using the first transform to generate a decoded video block. The computer program comprises code means which, when run on a computer, causes the computer to, responsive to the at least one flag being set to signal that the first transform is not to be used in both the horizontal direction and in the vertical direction, parse a second flag of the at least one flag to determine if the second flag is set to signal a second transform of the plurality of transforms is to be used to decode the encoded video block in both a horizontal direction and a vertical direction to generate the decoded video block. The computer program comprises code means which, when run on a computer, causes the computer to, responsive to the second flag being set to signal that the second transform is to be used in both the horizontal direction and in the vertical direction, decode the encoded video block in the horizontal direction and the vertical direction using the second transform to generate the decoded video block. The computer program comprises code means which, when run on a computer, causes the computer to, responsive to the second flag being set to signal that the second transform is not to be used in both the horizontal direction and in the vertical direction, parse a third flag of the at least one flag to determine in which of the horizontal direction or the vertical direction the second transform is to be used to decode the encoded video block and in which of the horizontal direction or the vertical direction a third transform is to be used to decode the encoded video block. The computer program comprises code means which, when run on a computer, causes the computer to decode the encoded video block using the second and third transforms to generate the decoded video block.
[0012] A fourth aspect of the embodiments defines a computer program product comprising computer readable means and a computer program according to the third aspect, stored on the computer readable means.
[0013] A fifth aspect of the embodiments defines a method performed by an encoder. The method comprises receiving a video block for encoding. The method comprises determining a characteristic of the video block. The method further comprises, responsive to the characteristic being of a type that indicates a multiple transform selection is used, selecting a first transform in a plurality of transforms that is part of the multiple transform selection and that is one of most computationally expensive to use or least likely to be used in encoding the video block. The method comprises testing combinations of the plurality of transforms in a horizontal direction and a vertical direction without testing a combination where the first transform is used in both the horizontal direction and the vertical direction. The method comprises selecting a combination from the combinations that provides the lowest rate distortion. The method comprises encoding the video block using the selected combination to generate an encoded video block. The method comprises, responsive to the characteristic being of a type that indicates a multiple transform selection is not to be used, encoding the video block using a default transform in the horizontal direction and the vertical direction.
[0014] A sixth aspect of the embodiments defines an encoder for encoding a block of video based on a block size of the block, wherein each of a horizontal direction and a vertical direction of the block is encoded using a transform, wherein the transform can be one of a first transform, a second transform or a third
transform. The encoder comprises at least one processor and a memory coupled to the processor. The memory comprises instructions executable by the processor, which cause the processor to perform determining a characteristic of the video block. The memory comprises instructions executable by the processor, which cause the processor to perform, responsive to the characteristic being of a type that indicates a multiple transform selection is used, selecting a first transform in a plurality of transforms that is part of the multiple transform selection and that is one of most computationally expensive to use or least likely to be used in encoding the video block. The memory comprises instructions executable by the processor, which cause the processor to perform testing combinations of the plurality of transforms in a horizontal direction and a vertical direction without testing a combination where the first transform is used in both the horizontal direction and the vertical direction. The memory comprises instructions executable by the processor, which cause the processor to perform selecting a combination from the combinations that provides the lowest rate distortion. The memory comprises instructions executable by the processor, which cause the processor to perform encoding the video block using the selected combination to generate an encoded video block. The memory comprises instructions executable by the processor, which cause the processor to perform, responsive to the characteristic being of a type that indicates a multiple transform selection is not to be used, encoding the video block using a default transform in the horizontal direction and the vertical direction.
[0015] A seventh aspect of the embodiments defines a computer program for encoding a block of video based on a block size of the block, wherein each of a horizontal direction and a vertical direction of the block is encoded using a transform, wherein the transform can be one of a first transform, a second transform or a third transform. The computer program comprises code means which, when run on a computer, causes the computer to determine a characteristic of the video block. The computer program comprises code means which, when run on a computer, causes the computer to, responsive to the characteristic being of a type that indicates a multiple transform selection is used, select a first transform in a plurality of transforms that is part of the multiple transform selection and that is one of most computationally expensive to use or least likely to be used in encoding the video block. The computer program comprises code means which, when run on a computer, causes the computer to test combinations of the plurality of transforms in a horizontal direction and a vertical direction without testing a combination where the first transform is used in both the horizontal direction and the vertical direction. The computer program comprises code means which, when run on a computer, causes the computer to select a combination from the combinations that provides the lowest rate distortion. The computer program comprises code means which, when run on a computer, causes the computer to encoding the video block using the selected combination to generate an encoded video block. The computer program comprises code means which, when run on a computer, causes the computer to, responsive to the characteristic being of a type that indicates a multiple transform selection is not to be used, encode the video block using a default transform in the horizontal direction and the vertical direction.
[0016] An eighth aspect of the embodiments defines a computer program product comprising computer readable means and a computer program according to the seventh aspect, stored on the computer readable means.
[0017] The advantages provided by the inventive concepts include reducing the encoder complexity by removing one of the five combinations described above. Both encoder and decoder complexity are reduced by using a less complex transform for certain block sizes. Furthermore, the efficiency of the binarization is increased as the number of bins for the most common combination (DST-7 in both directions) is reduced from 3 to 2.
BRIEF DESCRIPTION OF THE DRAWINGS
[0018] The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this application, illustrate certain non-limiting embodiments of inventive concepts. In the drawings:
[0019] Figure 1 is a block diagram illustrating an example of an
environment of a system in which the encoder and decoder may be implemented according to some embodiments;
[0020] Figure 2 is a block diagram is a block diagram illustrating an encoder according to some embodiments;
[0021 ] Figure 3 is a block diagram illustrating a decoder according to some embodiments;
[0022] Figure 4 is a block diagram illustrating components of a MTS tool;
[0023] Figure 5 is a block diagram illustrating components of a MTS tool according to some embodiments;
[0024] Figure 6 is a block diagram illustrating components of a MTS tool according to some embodiments;
[0025] Figures 7-11 are flow charts illustrating operations of an encoder and/or decoder in accordance with some embodiments of inventive concepts;
DETAILED DESCRIPTION
[0026] Inventive concepts will now be described more fully hereinafter with reference to the accompanying drawings, in which examples of embodiments of inventive concepts are shown. Inventive concepts may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of present inventive concepts to those skilled in the art. It should also be noted that these embodiments are not mutually exclusive. Components from one embodiment may be tacitly assumed to be present/used in another embodiment.
[0027] The following description presents various embodiments of the disclosed subject matter. These embodiments are presented as teaching examples and are not to be construed as limiting the scope of the disclosed subject matter.
For example, certain details of the described embodiments may be modified, omitted, or expanded upon without departing from the scope of the described subject matter.
[0028] Figure 1 illustrates an example of an operating environment of an encoder 100 that may be used to encode bitstreams as described herein. The encoder 100 has a multiple transform selection (MTS) component 102 used in encoding. The encoder 100 receives video from network 104 and/or from storage 106 and encodes the video into bitstreams using MTS component 102 for defined block sizes of the video as described below and transmits the encoded video to decoder 108 via network 110. Storage device 106 may be part of a storage depository of videos such as a storage repository of a store or a streaming video service, a separate storage component, a component of a mobile device, etc. The decoder 108 may be part of a device 112 having an audio/video (A/V) media player 114. The device 112 may be a mobile device, a set-top device, a desktop computer, and the like.
[0029] Figure 2 is a block diagram illustrating elements of encoder 100 configured to encode video frames according to some embodiments of inventive concepts. As shown, encoder 100 may include a network interface circuit 205 (also referred to as a network interface) configured to provide communications with other devices/entities/functions/etc. The encoder 100 may also include a processor circuit 201 (also referred to as a processor) coupled to the network interface circuit 205, and a memory circuit 203 (also referred to as memory) coupled to the processor circuit. The memory circuit 203 may include computer readable program code that when executed by the processor circuit 201 causes the processor circuit to perform operations according to embodiments disclosed herein.
[0030] According to other embodiments, processor circuit 201 may be defined to include memory so that a separate memory circuit is not required. As discussed herein, operations of the encoder 100 may be performed by processor 201 and/or network interface 205. For example, processor 201 may control network interface 205 to transmit communications to decoder 108 and/or to receive communications through network interface 104 from one or more other network nodes/entities/servers such as other encoder nodes, depository servers, etc.
Moreover, modules may be stored in memory 203, and these modules may provide instructions so that when instructions of a module are executed by processor 201 , processor 201 performs respective operations.
[0031 ] Figure 3 is a block diagram illustrating elements of decoder 108 configured to decode video frames according to some embodiments of inventive concepts. As shown, decoder 108 may include a network interface circuit 305 (also referred to as a network interface) configured to provide communications with other devices/entities/functions/etc. The decoder 108 may also include a processor circuit 301 (also referred to as a processor) coupled to the network interface circuit 305, and a memory circuit 303 (also referred to as memory) coupled to the processor circuit. The memory circuit 303 may include computer readable program code that when executed by the processor circuit 301 causes the processor circuit to perform operations according to embodiments disclosed herein.
[0032] According to other embodiments, processor circuit 301 may be defined to include memory so that a separate memory circuit is not required. As discussed herein, operations of the decoder 108 may be performed by processor 301 and/or network interface 305. For example, processor 301 may control network interface 305 to receive communications from encoder 100. Moreover, modules may be stored in memory 303, and these modules may provide instructions so that when instructions of a module are executed by processor 301 , processor 301 performs respective operations.
[0033] A potential advantage provided by the inventive concepts described herein include reducing the encoder run time by limiting the number of transform combinations to be evaluated in the case of an encoder implemented in software. In the case of an encoder implemented in hardware, the complexity reduction may take another form, such as lowered silicon area usage instead of encoder run time.
[0034] The embodiments described herein reduce the complexity of both the encoder and decoder by replacing a transform that is computationally expensive to use or that is infrequently used by another transform for certain block sizes. For example, in an encoder that is configured to operate under the WC standard, the DCT-8, which is relatively speaking computationally expensive, may be replaced by the DCT-2, which is relatively speaking less computationally expensive, for certain block sizes.
[0035] Furthermore, the compression efficiency is increased by using CABAC contexts to binarize emt_cu_flag and emt_tu_idx.
[0036] A further improvement is a reduction in memory usage as no transform coefficients for the transform replaced (e.g., size-32 DCT-8) have to be stored in the memory. In a hardware implementation this may translate to a smaller silicon surface area.
[0037] For example, in an implementation based on an anchor using VTM- 2.0.1 according to the Common Test Conditions (CTC) as described in [1 ], the compression efficiency (average BD-rate for luma) is improved by 0.07% in the All Intra configuration and 0.02% in the Random Access (RA) configuration. At the same time, the encoding time is reduced to 85% (Al) and 95% (RA), respectively, compared to the anchor. There is minimal, if any, impact on the complexity of the decoder, but to the extent that is impact, it is favorable. One reason for this is due to the computationally expensive combination of DCT-8 horizontally and DCT-8 vertically being removed from use. When implementing the same modifications in VTM-3.0, the improvements in compression efficiency are 0.03% (Al) and 0.01 % (RA), while the encoder run time is reduced to 88% (Al) and 98% (RA), respectively.
[0038] In the description that follows, an encoder and decoder configured to perform in accordance with portions of the WC standardization is used to describe the inventive concepts. Other standardizations may be implemented using the concepts described herein.
[0039] Figure 4 illustrates an embodiment of how a MTS tool is presently implemented. Figure 5 illustrates how the MTS of Figure 4 is changed in one embodiment. In Figure 4, each node is marked with a letter followed by a colon sign (i.e. , "a:" to "j:"). In figure 5, each node is marked with two letters followed by a colon sign (i.e., "aa:" to "hh:"). In figure 6, each node is marked with three letters followed by a colon sign (i.e., "aaa:" to "jjj:") The inventors realized that several different changes to the MTS tool currently implemented in the draft of the WC standard may be made to increase computational efficiency of the encoder and decoder. In the following description of the changes, the nodes of Figure 4-6 will be referred to by the letter or letters in the figures. Based on Figure 4, the following changes are made: Change 1 : The combination of DCT-8 horizontally and DCT-8 vertically in branch 2 (node g) is no longer allowed. This implies that an encoder does not evaluate this combination, thus reducing the evaluation run time. The decoder can conclude that, if the mts_tu_idx_hor indicates DCT-8 (node e), the
mts_tu_idx_ver will, with the change, always indicate DST-7 (node h).
Change 2: For certain block sizes the DCT-8 in branch 2 (nodes e, g and i in Figure 4, nodes gg and hh in Figure 5) is replaced by the DCT-2. If a block is of a specific size, the encoder will know that in branch 2 it will evaluate the DCT-2 instead of the DCT-8. This adds on to change 1 as for these blocks the combination of DCT-2 horizontally and DCT-2 vertically in branch 2 should not be evaluated, as this exact case is already covered in branch 1. The decoder acknowledges this change by applying the DCT-2 instead of the DCT-8 in cases where the block is of a specific size and either the mts_tu_idx_hor or mts_tu_idx_ver indicates the use of the DCT-8. Due to this change, the text below refers to the DCT-X, which means DCT-8 for some block sizes and DCT-2 for other block sizes.
Change 3: The combination of DST-7 horizontally and DST-7 vertically (node j), which is the most common combination of transforms, is moved in the coding tree in Figure 5 to the position currently occupied by the DCT-X horizontally (node ee). Due to change 1 , the mts_tu_idx_ver does not need to be encoded if the mts_tu_idx_hor indicates the DCT-X. This change takes advantage of this omission.
Change 4: As the mts_tu_idx_ver flag is only encoded based on the value of the mts_tu_idx_hor flag, both flags are removed and replaced by two new flags. The two new flags (also illustrated in Figure 5) are mts_dst_flag and mts_tu_flag. a) The mts_dst_flag indicates whether to use DST-7 in both
directions. b) The mts_tu_flag indicates in which direction DCT-X and DST-7 are to be used.
Change 5: The mts_tu_flag signals whether to use a more
preferred combination or a less preferred combination. The
determination which of the available combinations is more
preferred is made based on the direction of the intra prediction. If
a block is using inter prediction, the flag also signals whether to
use a more preferred combination, but the determination of which
combination is more preferred is based on different information,
for instance the block size or block shape.
Change 6: Previously, the context selection of the mts_cu_flag
was made based on the split depth. In these changes, the correct
context is determined based on the larger dimension (width or
height) of the block as well as the direction of the intra prediction.
If a block uses inter prediction, the context is selected based on
different information, for example, the block size or block shape.
Change 7 (see Figure 6): The mts_tu_idx_hor flag is replaced by
a flag (e.g., mst_same_flag) indicating whether both transforms
are identical. If the flag is set, the same transform will be used in
both directions. An additional bit will be encoded to indicate which
transform to use. If the flag is not set, two different transforms will be used in the two directions, with an additional bit being
encoded to indicate which transform to use in which direction.
[0040] Change 1 to change 6 are reflected in Figure 5 and change 7 is reflected in Figure 6. In the description that follows, various embodiments shall be described indicating which changes are made for specific block sizes and for specific types of blocks (i.e. , inter coded blocks or intra coded blocks).
[0041 ] Prior to describing various embodiments based on the above changes, an overview of how the encoder 100 and decoder 108 operate with the changes implemented shall be described. Turning now to Figure 7, the encoder 100 in operation 701 determines a characteristic of the video block to be encoded. The characteristic may be block size, block type (inter/intra), channel type, prediction mode, dimension (width or height) of the block as well as the direction of the intra prediction, etc.
[0042] Responsive to the characteristic being of a type that indicates a multiple transform selection component is used, the encoder 100 in operation 703 selects a first transform from a plurality of transforms used by the multiple selection transform component (MST) and that is either the most computationally expensive or least likely to be used in encoding the video block. For example, when the transforms used by the MST are DCT-2, DST-7, and DCT-8, the DCT-8 often is the most computationally expensive to use. In such scenarios, the DCT-8 transform may be selected and designated as the first transform.
[0043] In operation 705, the encoder 100 tests combinations of transforms without testing a combination where the first transform is used both in the horizontal direction and in the vertical direction. For example, the DCT-8 transform in the scenario described in operation 703 would not be tested in both the horizontal direction and the vertical direction.
[0044] In operation 707, a combination is selected that provides the lowest rate distortion in comparison to other test combinations. Other decision factors may also be used in selecting the combination to use. For example, of one of the transforms is preferred over another transform and both transforms have
comparable rate distortions, the preferred transform may be used.
[0045] In operation 709, the video block is encoded using the combination selected to generate an encoded block. In operation 711 , the encoded block is transmitted to a decoder, such as decoder 108, with flags that are used by the decoder to determine which combination was used in encoding and is to be used in decoding the encoded block.
[0046] Responsive to the characteristic not being of the type, the video block is encoded using a default transformation is both horizontal and vertical directions. In one embodiment, the DCT-2 transform may be used as the default transform. In operation 715, the encoded block is transmitted to the decoder, such as decoder 108, with flags that are used by the decoder to determine which combination was used in encoding and is to be used in decoding the encoded block. [0047] Turning now to Figure 8, operations that the decoder 108 may perform are illustrated. In operation 801 , the decoder receives an encoded video block that has flags. In operation 803, a first flag is parsed to determine if the flag is set. The first flag may be the mts_cu_flag. The flag setting may indicate whether a first transform is to be used to decode the encoded video block in both the horizontal direction and the vertical direction. For example, in one embodiment, the setting may be a binary setting of a 1 or a 0. In other words, the first flag is equal to a first value or a second value A setting of 1 may indicate the first transform is to be used in both directions. In other embodiments, a setting of 0 may be used to indicate the first transform is to be used in both directions.
[0048] In operation 805, the video block is decoded using the first transform in both the horizontal direction and the vertical direction responsive to the first flag have a value associated with the first transform being used in both directions (e.g., the first flag is equal to a first value). For example, the DCT-2 transform may be used in both the horizontal direction and the vertical direction to decode the video block.
[0049] In operation 807, a second flag is parsed responsive to the first flag setting having a value associated with the first transform not being used in both directions. The second flag is parsed to determine the second flag setting. The flag setting may indicate whether a second transform is to be used to decode the encoded video block in both the horizontal direction and the vertical direction. For example, in one embodiment, the setting may be a binary setting of a 1 or a 0. In other words, the second flag is equal to a first value or a second value. A setting of 1 may indicate the second transform is to be used in both directions. In other embodiments, a setting of 0 may be used to indicate the second transform is to be used in both directions.
[0050] The second transform may be one of two transforms. The second flag may be parsed to determine which of the two transforms to be sued to decode the video block. For example, the two transforms in one embodiment may be the DST-7 transform and the DCT-8 transform
[0051 ] In operation 809, the video block is decoded using the second transform in both the horizontal direction and the vertical direction responsive to the second flag have a value associated with the second transform being used in both directions (e.g., the second flag is equal to a first value). For example, the DST-7 transform may be used in both the horizontal direction and the vertical direction to decode the video block in operation 809.
[0052] In operation 811 , a third flag is parsed responsive to the second flag setting having a value associated with the second transform not being used in both directions. The third flag is parsed to determine the third flag setting. The third flag setting may indicate whether a second transform is to be used to decode the encoded video block in the horizontal direction or the vertical direction and a third transform to be used to decode in the other of the horizontal direction and vertical direction. This may be a first preferred transform combination. For example, in one embodiment, the setting may be a binary setting of a 1 or a 0. A setting of 1 may indicate the second transform is to be used in the horizontal direction and the third transform is to be used in the vertical direction. In other embodiments, a setting of 0 may be used to indicate the second transform is to be used in the horizontal direction and the third transform to be used in the vertical direction. This may be a second preferred transform combination. The third transform in an embodiment may be the first transform.
[0053] In operation 813, the video block is decoded using the second transform in either the horizontal direction or the vertical direction based on the setting of the third flag. For example, the DST-7 transform may be used in the horizontal direction and either the DCT-2 or DCT-8 transform used in the vertical direction to decode the video block in operation 813. Alternatively, the DST-7 transform may be used in the vertical direction and either the DCT-2 or DCT-8 transform used in the horizontal direction to decode the video block in operation 813.
[0054] In operation 815, the decoder may output the decoded video block to a media player for playback of the decoded video block.
[0055] Turning now to Figure 9, in an alternate embodiment, the decoder 108 in operation 901 determines whether a first criterion is met based on the block size of the encoded video block. The criterion may be block size, block type
(inter/intra), channel type, prediction mode, dimension (width or height) of the block, etc.
[0056] In operation 903, responsive to the first criterion met, the decoder selects the transform combination from one: of the first transform in both the vertical direction and the horizontal direction; the third transform in both the vertical direction and the horizontal direction; the first transform in the vertical direction and the third transform in the horizontal direction; and the third transform in the vertical direction and the first transform in the horizontal direction.
[0057] In operation 905, responsive to the first criterion met, the decoder selects the transform combination from one of: the first transform in both the vertical direction and the horizontal direction; the third transform in both the vertical direction and the horizontal direction; the second transform in the vertical direction and the third transform in the horizontal direction; and the third transform in the vertical direction and the second transform in the horizontal direction.
[0058] Inn operation 907, the decoder decodes the block using the selected combination. In operation 909, the decoder may transmit the encoded block towards a media player.
[0059] The first transform in the embodiments described below is the DCT- 2 transform, the second transform is the DCT-8 transform, and the third transform is the DST-7 transform. In the description of the embodiments that follows, the first criterion is block size.
[0060] In a first embodiment, change 1 is done for all block sizes where the MTS tool is allowed, and change 2 is done for all blocks where at least one dimension has a length of 32 samples. In this first embodiment all blocks of size 16x16 or smaller evaluate the following combinations:
- DCT-2 horizontally and DCT-2 vertically
- DST-7 horizontally and DST-7 vertically
- DST-7 horizontally and DCT-8 vertically
- DCT-8 horizontally and DST-7 vertically
[0061 ] For blocks of size 32xN or Nx32 in the first embodiment, where N can be 4, 8, 16 or 32, the following combinations are evaluated:
- DCT-2 horizontally and DCT-2 vertically
- DST-7 horizontally and DST-7 vertically
- DST-7 horizontally and DCT-2 vertically
- DCT-2 horizontally and DST-7 vertically
[0062] The decoder can determine the correct combination of transforms based on the parsed flags and the block size. If the block is of size 16x16 or smaller, the decoded bins can indicate the following combinations:
- DCT-2 horizontally and DCT-2 vertically
- DST-7 horizontally and DST-7 vertically
- DST-7 horizontally and DCT-8 vertically
- DCT-8 horizontally and DST-7 vertically [0063] If the block is of size 32xN or Nx32 in the first embodiment, where N can be 4, 8, 16 or 32 (i.e. , the first criterion of Figure 9 is met when the encoded block has a size of form 32xN or Nx32 where N can assume the values 4, 8, 16, or 32), the following combinations can be indicated:
- DCT-2 horizontally and DCT-2 vertically
- DST-7 horizontally and DST-7 vertically
- DST-7 horizontally and DCT-2 vertically
- DCT-2 horizontally and DST-7 vertically
[0064] Table 1 shows where DCT-2 and DCT-8 are used in the first embodiment:
Figure imgf000020_0001
Table 1
[0065] In a second embodiment, change 1 is done for all block sizes where the MTS tool is allowed, and change 2 is done for all blocks of size 16x32, 32x16 or 32x32. In this embodiment all blocks of size 16x16 or smaller, 4x32, 8x32, 32x4 and 32x8 evaluate the following combinations:
- DCT-2 horizontally and DCT-2 vertically
- DST-7 horizontally and DST-7 vertically
- DST-7 horizontally and DCT-8 vertically
- DCT-8 horizontally and DST-7 vertically
[0066] For blocks of size 32x16, 16x32 or 32x32 in the second
embodiment, the following combinations are evaluated:
- DCT-2 horizontally and DCT-2 vertically
- DST-7 horizontally and DST-7 vertically
- DST-7 horizontally and DCT-2 vertically
- DCT-2 horizontally and DST-7 vertically
[0067] The decoder is able to determine the correct combination of transforms based on the parsed flags and the block size. If the block is of size 16x16 or smaller, 4x32, 8x32, 32x4 or 32x8 the decoded bins can indicate the following combinations: - DCT-2 horizontally and DCT-2 vertically
- DST-7 horizontally and DST-7 vertically
- DST-7 horizontally and DCT-8 vertically
- DCT-8 horizontally and DST-7 vertically
[0068] If the block is of size 32x16, 16x32 or 32x32 in the second embodiment (i.e. , the first criterion of Figure 9 is met when the encoded block has a size of form 32x32 or 32x16 or 16x32), the following combinations can be indicated:
- DCT-2 horizontally and DCT-2 vertically
- DST-7 horizontally and DST-7 vertically
- DST-7 horizontally and DCT-2 vertically
- DCT-2 horizontally and DST-7 vertically
[0069] Table 2 shows where DCT-2 and DCT-8 are used in the second embodiment:
Figure imgf000021_0001
Table 2
[0070] In a third embodiment, changes 1 , 3 and 4 are done for all block sizes. If a step to the right in Figure 5 is encoded as a , and a step to the left is encoded as a O’, the combinations would be encoded as follows:
Figure imgf000021_0002
Table 3
[0071 ] The decoder will parse the flags and determine the combination of transforms based on the decoded bins. With respect to figure 8, the mts_cu_flag may be the first flag, the mts_dst_flag may be the second flag, and the mts_tu_flag may be the third flag.
Figure imgf000022_0001
Table 4
[0072] In a set of embodiments, changes 1 , 3, 4 and 5 are done for all block sizes. As an example, the more preferred combination as described in change 5 can be marked by setting the mts_tu_flag to and the less preferred combination as described in change 5 can be marked by setting the mts_tu_flag to O’. Figure 10 illustrates this embodiment. Turning to figure 10, table 4 may be used by the decoder to determine the transform combination to use. In operation 1001 , the decoder parses the first flag to determine if the first flag is equal to a first value or a second value. In operation 1003, responsive to the first flag being equal to the first value, the first transform is selected to decode the encoded block in both the vertical direction and the horizontal direction. In operation 1005, responsive to the first flag being equal to the second value, a second flag is parsed to determine whether the second flag is equal to the first value or the second value. In operation 1007, responsive to the second flag being equal to the first value, the third transform is selected to decode the encoded block in both the vertical direction and the horizontal direction. In operation 1009, responsive to the second flag being equal to the second value, a third flag is parsed to determine whether the third flag is equal to the first value or the second value. In operation 1011 , responsive to the third flag being equal to the first value, a more preferred transform combination is selected to decode the encoded block. In operation 1013, responsive to the third flag being equal to the second value, a less preferred transform combination is selected to decode the encoded block.
[0073] In a fifth embodiment that is one of the set of embodiments, if the block is using intra prediction, the combination of using DST-7 horizontally and DCT-X vertically is regarded more preferred if the intra direction is closer to horizontal than to vertical. At the same time, if the intra direction is closer to vertical than to horizontal, the combination of using DCT-X horizontally and DST-7 vertically is regarded as more preferred. Thus, the decoder will determine the combination based on the intra direction of the block. [0074] If the intra direction is, for example, purely horizontal, and the decoder reads the mts_tu_flag as , it will use a transform combination of DST-7 horizontally and DCT-X vertically. If the flag is read as O’, the decoder will use a transform combination of DCT-X horizontally and the DST-7 vertically.
[0075] If the intra direction is, for example, purely vertical, and the decoder reads the mts_tu_flag as‘T, it will use a transform combination of DCT-X horizontally and the DST-7 vertically. If the flag is read as O’, the decoder will use a transform combination of DST-7 horizontally and DCT-X vertically.
[0076] In a sixth embodiment that is one of the set of embodiments, if the block is using inter prediction, the combination of using DST-7 horizontally and DCT-X vertically is regarded as more probable if the block has a larger width than height. If the block has a larger height than width the combination of using DCT-X horizontally and DST-7 vertically is regarded as more probable.
[0077] If the block has, for example, a size of 16x4 samples, and the decoder reads the mts_tu_flag as‘T, it will use a transform combination of DST-7 horizontally and DCT-X vertically. If the flag is read as O’, the decoder will use a transform combination of DCT-X horizontally and the DST-7 vertically.
[0078] If the block has, for example, a size of 4x16 samples, and the decoder reads the mts_tu_flag as‘T, it will use a transform combination of DCT-X horizontally and DST-7 vertically. If the flag is read as O’, the decoder will use a transform combination of DST-7 horizontally and the DCT-X vertically.
[0079] In the embodiments above, a 45-degree prediction direction is equally close to vertical as to horizontal. Therefore, the decoder and encoder have to agree on a tie-breaking rule to treat 45-degree directions in the same manner. In the set of embodiments above, this is handled by treating 45-degree directions as more vertical than horizontal. In a different embodiment, it may be advantageous to use a different tie-breaking rule, such as treating 45-degree directions as horizontal. Another possibility is to change at another degree than 45-degree directions. As an example, it may be advantageous to treat not only 45-degree directions as vertical, but also treat, for example, 43-degree directions as vertical, although mathematically they are closer to a horizontal direction. In general, it is therefore possible to use any angle in the tie-break rule, not just diagonal directions. [0080] Another case where a tie-breaking rule should be defined are non- directional intra prediction modes (planar or DC). In the set of embodiments above, these predictions are treated as more horizontal than vertical. In a slightly different embodiment, it might be advantageous to treat these as more vertical than horizontal. For example, in an implementation, the intra modes 0-34 are treated as being closer to horizontal and the intra modes 35-66 are treated as being closer to vertical.
[0081 ] In a seventh embodiment, change 6 is used for intra coded blocks. The selection of which context to use for encoding and decoding the mts_cu_flag is made based on the longer side of the block and the intra direction. The intra directions are divided into two groups, one where using the DCT-2 horizontally and vertically is more preferred and one where using the DCT-2 horizontally and vertically is less preferred. These groups can be identical for different block sizes. Using the DCT-2 both horizontally and vertically can for example be more preferred if the intra mode is close to horizontal or vertical. In the same example, the combination would be less preferred if the intra direction is close to diagonal.
[0082] Turning to figure 11 , in operation 1101 , the decoder determines if the block is of size 32xN or Nx32 where N can be 4, 8, 16 or 32. In operation 1103, responsive to the block being one of size 32xN or Nx32 and the intra direction is close to horizontal or close to vertical (i.e. , it passes one of a horizontal closeness test or a vertical closeness test as determined in operation 1103), for instance if it is purely horizontal, one context will be chosen, for example with a first identifier (id) 0 in operation 1105.
[0083] Responsive to the block being of size 32xN or Nx32 where N can be 4, 8, 16 or 32 and the intra direction is close to diagonal (i.e., it does not pass one of a horizontal closeness test or a vertical closeness test as determined in operation 1103), for instance if it is purely diagonal, a different context will be chosen, for example with a second id 1 in operation 1107.
[0084] In operation 1 108, the decoder determines if the block is of size 16xN or Nx16 where N can be 4, 8 or 16. In operation 111 1 , responsive to the block being one of size 16xN or Nx16 and the intra direction is close to horizontal or close to vertical (i.e., it passes one of a horizontal closeness test or a vertical closeness test as determined in operation 1111 ), for instance if it is purely vertical, a different context will be chosen, for example with a third id 2 in operation 11 13. [0085] Responsive to the block being is of size 16xN or Nx16 where N can be 4, 8 or 16 and the intra direction is close to diagonal (i.e. , it does not pass one of a horizontal closeness test or a vertical closeness test as determined in operation 11 11 ), for instance if it is purely diagonal, a different context will be chosen, for example with a fourth id 3 in operation 11 15.
[0086] In operation 1117, the decoder determines, if the block is of size 8x8, 8x4, 4x8 or 4x4. In operation 11 19, responsive to the block being one of size 8x8, 8x4, 4x8, or 4x4 and the intra direction is close to horizontal or close to vertical (i.e., it passes one of a horizontal closeness test or a vertical closeness test as determined in operation 1119), for instance if it is purely horizontal, a different context will be chosen, for example with a fifth id 4 in operation 1121.
[0087] Responsive to the block being of size 8x8, 8x4, 4x8 or 4x4 and the intra direction is close to diagonal (i.e., it does not pass one of a horizontal closeness test or a vertical closeness test as determined in operation 1119), for instance if it is purely diagonal, a different context will be chosen, for example with a sixth id 5 in operation 1123.
[0088] This can be summarized in the following table:
Figure imgf000025_0001
Table 5
[0089] As described in the previous embodiment, a set of tie-breaking rules should be defined for the encoder and decoder for the cases where a prediction direction is equally close to horizontal and vertical. Tie-breaking rules should also be defined for the non-directional intra prediction modes Planar or DC. For example, in one implementation, the intra modes 10-22 may be seen as close to horizontal and may be treated as being horizontal , the intra modes 46-57 may be seen as close to vertical and may be treated as being vertical, and the remaining intra modes 0-9, 23-45 and 58-66 may be seen as close to diagonal and be treated as being diagonal. [0090] In an eighth embodiment, change 6 is used for inter coded blocks. The selection of which context to use for encoding and decoding the mts_cu_flag is made based on the block size and shape. For example, the six contexts can be selected as follows:
a) If the block is of size 4x32 or 32x4 in the eighth embodiment, one context is used, for example with identifier (id) 0.
b) If the block is of size 4x16, 8x32, 32x8 or 16x4 in the eighth embodiment, a different context is used, for example with id 1. c) If the block is of size 4x8 or 8x4 in the eighth embodiment, a different context is used, for example with id 2.
d) If the block is of size 8x16, 16x32, 32x16 or 16x8 in the eighth embodiment, a different context is used, for example with id 3. e) If the block is of size 16x16 or 32x32 in the eighth embodiment, a different context is used, for example with id 4.
f) If the block is of size 8x8 or 4x4 in the eighth embodiment, a different context is used, for example with id 5.
[0091 ] The eighth embodiment can be summarized in Table 6:
Figure imgf000026_0001
Table 6
[0092] In a ninth embodiment, change 1 is done for all block sizes where the MTS tool is allowed, and change 2 is done for all blocks where at least one dimension has a length of 16 or 32 samples. In this embodiment all blocks of size 8x8 or smaller evaluate the following combinations:
DCT-2 horizontally and DCT-2 vertically
DST-7 horizontally and DST-7 vertically
DST-7 horizontally and DCT-8 vertically
DCT-8 horizontally and DST-7 vertically
[0093] For blocks of size 16xN, Nx16, 32xN or Nx32 in the ninth
embodiment, where N can be 4, 8, 16 or 32, the following combinations are evaluated:
DCT-2 horizontally and DCT-2 vertically
DST-7 horizontally and DST-7 vertically
DST-7 horizontally and DCT-2 vertically DCT-2 horizontally and DST-7 vertically
[0094] The decoder can determine the correct combination of transforms based on the parsed flags and the block size. If the block is of size 8x8 or smaller, the decoded bins can indicate the following combinations:
DCT-2 horizontally and DCT-2 vertically
DST-7 horizontally and DST-7 vertically
DST-7 horizontally and DCT-8 vertically
DCT-8 horizontally and DST-7 vertically
[0095] If the block is of size 16xN, Nx16, 32xN or Nx32 in the ninth embodiment, where N can be 4, 8, 16 or 32 (i.e. , the first criterion of Figure 9 is met when the encoded block has a size of form 16xN, Nx16, 32xN or Nx32 where N can assume the values 4, 8, 16, or 32), the following combinations can be indicated:
DCT-2 horizontally and DCT-2 vertically
DST-7 horizontally and DST-7 vertically
DST-7 horizontally and DCT-2 vertically
DCT-2 horizontally and DST-7 vertically
[0096] Table 7 shows where DCT-2 and DCT-8 are used in the ninth embodiment:
Figure imgf000027_0001
Table 7
[0097] In a tenth embodiment, change 1 is done for all block sizes where the MTS tool is allowed, and change 2 is done for all blocks where at least one dimension has a length of 32 samples or 4 samples. In this embodiment all blocks of size 8x8, 8x16, 16x8 or 16x16 evaluate the following combinations:
DCT-2 horizontally and DCT-2 vertically
DST-7 horizontally and DST-7 vertically
DST-7 horizontally and DCT-8 vertically
DCT-8 horizontally and DST-7 vertically
[0098] For blocks of size 4xN, Nx4, 32xN or Nx32 in the tenth
embodiment, where N can be 4, 8, 16 or 32, the following combinations are evaluated: DCT-2 horizontally and DCT-2 vertically
DST-7 horizontally and DST-7 vertically
DST-7 horizontally and DCT-2 vertically
DCT-2 horizontally and DST-7 vertically
[0099] The decoder can determine the correct combination of transforms based on the parsed flags and the block size. If the block is of size 8x8, 8x16, 16x8 or 16x16, the decoded bins can indicate the following combinations:
DCT-2 horizontally and DCT-2 vertically
DST-7 horizontally and DST-7 vertically
DST-7 horizontally and DCT-8 vertically
DCT-8 horizontally and DST-7 vertically
[00100] If the block is of size 4xN, Nx4, 32xN or Nx32 in the tenth embodiment, where N can be 4, 8, 16 or 32 (i.e. , the first criterion of Figure 9 is met when the encoded block has a size of form 4xN, Nx4, 32xN or Nx32 where N can assume the values 4, 8, 16, or 32), the following combinations can be indicated:
DCT-2 horizontally and DCT-2 vertically
DST-7 horizontally and DST-7 vertically
DST-7 horizontally and DCT-2 vertically
DCT-2 horizontally and DST-7 vertically
[00101 ] Table 8 shows where DCT-2 and DCT-8 are used in the tenth embodiment:
Figure imgf000028_0001
Table 8
[00102] In an eleventh embodiment, change 1 is done for all block sizes where the MTS tool is allowed, and change 2 is done for all blocks where at least one dimension has a length of 32 samples or the block has a size of 4x4 samples. In this embodiment all blocks of size 16x16 or smaller but larger than 4x4 evaluate the following combinations:
DCT-2 horizontally and DCT-2 vertically
DST-7 horizontally and DST-7 vertically
DST-7 horizontally and DCT-8 vertically
DCT-8 horizontally and DST-7 vertically [00103] For blocks of size 4x4, 32xN or Nx32 in the eleventh embodiment, where N can be 4, 8, 16 or 32, the following combinations are evaluated:
DCT-2 horizontally and DCT-2 vertically
DST-7 horizontally and DST-7 vertically
DST-7 horizontally and DCT-2 vertically
DCT-2 horizontally and DST-7 vertically
[00104] The decoder can determine the correct combination of transforms based on the parsed flags and the block size. If the block is of size 16x16 or smaller but larger than 4x4, the decoded bins can indicate the following combinations:
DCT-2 horizontally and DCT-2 vertically
DST-7 horizontally and DST-7 vertically
DST-7 horizontally and DCT-8 vertically
DCT-8 horizontally and DST-7 vertically
[00105] If the block is of size 4x4, 32xN or Nx32 in the eleventh
embodiment, where N can be 4, 8, 16 or 32 (i.e. , the first criterion of Figure 9 is met when the encoded block has a size of form 4x4, 32xN or Nx32 where N can assume the values 4, 8, 16, or 32), the following combinations can be indicated:
DCT-2 horizontally and DCT-2 vertically
DST-7 horizontally and DST-7 vertically
DST-7 horizontally and DCT-2 vertically
DCT-2 horizontally and DST-7 vertically
[00106] Table 9 shows where DCT-2 and DCT-8 are used in the eleventh embodiment:
Figure imgf000029_0001
Table 9
[00107] In a twelfth embodiment, changes 1 and 2 is done for all block sizes where the MTS tool is allowed. In this embodiment all blocks evaluate the following combinations:
DCT-2 horizontally and DCT-2 vertically
DST-7 horizontally and DST-7 vertically
DST-7 horizontally and DCT-2 vertically
DCT-2 horizontally and DST-7 vertically [00108] The decoder can determine the correct combination of transforms based on the parsed flags. The decoded bins can indicate the following
combinations:
- DCT-2 horizontally and DCT-2 vertically
- DST-7 horizontally and DST-7 vertically
- DST-7 horizontally and DCT-2 vertically
- DCT-2 horizontally and DST-7 vertically
[00109] In a further set of embodiments, change 7 is incorporated. A new flag, called mts_same_flag, is signaled to indicate whether a block use the same transform in both horizontal and vertical direction. In one embodiment, if the flag has the value , the block uses identical transforms in both directions, whereas if the flag has the value O’, two different transformations will be used.
[00110] In an embodiment, the mts_same_flag indicates that a block uses the same transform in both horizontal and vertical direction. An additional flag mts_tijjdx is signaled to indicate whether to use DCT-8 or DST-7 in both directions.
[00111 ] In another embodiment, the mts_same_flag indicates that a block uses different transforms in horizontal and vertical direction. An additional flag mts_tijjdx is signaled to indicate whether to use DCT-8 in the horizontal direction and DST-7 in the vertical direction, or DST-7 in the horizontal direction and DCT-8 in the vertical direction.
[00112] The processing in the decoder works analogously. First, the mts_same_flag is parsed by the decoder, followed by parsing the mts_tu_idx to determine the correct combination of transforms to use.
[00113] In another embodiment, the mts_same_flag is parsed by the decoder, indicating that the same transform should be used in both horizontal and vertical direction. Afterwards, the mts_tu_idx is parsed by the decoder, indicating whether to use DST-7 or DCT-8 in both directions.
[00114] In another embodiment, the mts_same_flag is parsed by the decoder, indicating that two different transforms should be used for the current block. The mts_tu_idx is parsed by the decoder to determine whether to use DCT-8 in horizontal and DST-7 in vertical direction, or DST-7 in horizontal and DCT-8 in vertical direction
[00115] Thus, the disabling of one of the transform combination which enables the change to the CABAC coding by replacing the two existing flags as described herein with two new flags. Another key aspect is to replace one transform in certain cases by a different transform.
REFERENCES
[1 ] F. Bossen, J. Boyce, X. Li, V. Seregin, K. Siihring (editors):“JVET common test conditions and software reference configurations for SDR video”, JVET-L1010, Macau, October 2018
[2] G. J. Sullivan, J.-R. Ohm:“Meeting Report of the 11th JVET Meeting, (Ljubljana, 10-18 July 2018)”, section 6.6, JVET-K1000, Ljubljana, July 2018

Claims

1. A method performed by a decoder, the method comprising:
receiving an encoded video block having at least one flag encoded using context-based adaptive arithmetic coding;
parsing the at least one flag to determine if the at least one flag is set to signal that a first transform of a plurality of transforms is to be used to decode the encoded video block in both a horizontal direction and a vertical direction;
responsive to the at least one flag being set to signal that the first transform is to be used in both the horizontal direction and in the vertical direction:
decoding the encoded video block in the horizontal direction and the vertical direction using the first transform to generate a decoded video block;
responsive to the at least one flag being set to signal that the first transform is not to be used in both the horizontal direction and in the vertical direction:
parsing a second flag of the at least one flag to determine if the second flag is set to signal that a second transform of the plurality of transforms is to be used to decode the encoded video block in both a horizontal direction and a vertical direction to generate the decoded video block;
responsive to the second flag being set to signal that the second transform is to be used in both the horizontal direction and in the vertical direction:
decoding the encoded video block in the horizontal direction and the vertical direction using the second transform to generate the decoded video block;
responsive to the second flag being set to signal that the second transform is not to be used in both the horizontal direction and in the vertical direction:
parsing a third flag of the at least one flag to determine in which of the horizontal direction or the vertical direction the second transform is to be used to decode the encoded video block and in which of the horizontal direction or the vertical direction a third transform is to be used to decode the encoded video block; and decoding the encoded video block using the second and third transforms to generate the decoded video block.
2. The method of claim 1 wherein the second transform comprises one of two transforms, the method further comprising parsing the second flag to determine which one of the two transforms is to be used to decode the video block.
3. The method of claim 2 wherein the two transforms comprise a Discrete Sine Transformation, DST-7, and a Discrete Cosine Transformation, DCT-8.
4. The method of any of claims 1 -3 wherein the first transform comprises a DCT- 2 transform.
5. The method of any of claims 1 -4 wherein the second transform comprises a DST-7 transform.
6. The method of any of claims 1 -5 wherein the third transform comprises one of the DCT-2 transform or a DCT-8 transform.
7. The method of any of claims 1 -6, wherein decoding in the horizontal direction comprises applying a transform from the plurality of transforms in the horizontal direction and wherein encoding in the vertical direction comprises applying a transform from the plurality of transforms in the vertical direction.
8. A decoder comprising:
at least one processor (301 );
memory (303) coupled to the processor, said memory comprising
instructions executable by the processor, which cause the processor to perform operations comprising:
receiving an encoded video block having at least one flag encoded using context-based adaptive arithmetic coding;
parsing the at least one flag to determine if the at least one flag is set to signal that a first transform of a plurality of transforms is to be used to decode the encoded video block in both a horizontal direction and a vertical direction; responsive to the at least one flag being set to signal that the first transform is to be used in both the horizontal direction and in the vertical direction:
decoding the encoded video block in the horizontal direction and the vertical direction using the first transform to generate a decoded video block;
responsive to the at least one flag being set to signal that the first transform is not to be used in both the horizontal direction and in the vertical direction:
parsing a second flag of the at least one flag to determine if the second flag is set to signal a second transform of the plurality of transforms is to be used to decode the encoded video block in both a horizontal direction and a vertical direction to generate the decoded video block;
responsive to the at least one flag being set to signal that the first transform is to be used in both the horizontal direction and in the vertical direction:
decoding the encoded video block in the horizontal direction and the vertical direction using the second transform to generate the decoded video block;
responsive to the at least one flag being set to signal that the first transform is not to be used in both the horizontal direction and in the vertical direction:
parsing a third flag of the at least one flag to determine in which of the horizontal direction or the vertical direction the second transform is to be used to decode the encoded video block and in which of the horizontal direction or the vertical direction a third transform is to be used to decode the encoded video block; and
decoding the encoded video block using the second and third transforms to generate the decoded video block.
9. The decoder of claim 8 wherein the second transform comprises one of two transforms, and wherein the memory further comprises instructions which cause the processor to perform parsing the second flag to determine which one of the two transforms is to be used to decode the video block.
10. The decoder of claim 9 wherein the two transforms comprise a Discrete Sine Transformation, DST-7, and a Discrete Cosine Transformation, DCT-8 transform.
11. The decoder of any of claims 8-10, wherein the first transform comprises a DCT-2 transform.
12. The decoder of any of claims 8-11 , wherein the second transform comprises a DST-7 transform.
13. The decoder of any of claims 8-12, wherein the third transform comprises one of the DCT-2 transform or a DCT-8 transform
14. A computer program for a decoder, the computer program comprising code means which, when run on a computer, causes the computer to:
receive an encoded video block having at least one flag encoded using context-based adaptive arithmetic coding;
parse the at least one flag to determine if the at least one flag is set to signal that a first transform of a plurality of transforms is to be used to decode the encoded video block in both a horizontal direction and a vertical direction;
responsive to the at least one flag being set to signal that the first transform is to be used in both the horizontal direction and in the vertical direction:
decode the encoded video block in the horizontal direction and the vertical direction using the first transform to generate a decoded video block;
responsive to the at least one flag being set to signal that the first transform is not to be used in both the horizontal direction and in the vertical direction:
parse a second flag of the at least one flag to determine if the second flag is set to signal that a second transform of the plurality of transforms is to be used to decode the encoded video block in both a horizontal direction and a vertical direction to generate the decoded video block; responsive to the second flag being set to signal that the second transform is to be used in both the horizontal direction and in the vertical direction:
decode the encoded video block in the horizontal direction and the vertical direction using the second transform to generate the decoded video block;
responsive to the second flag being set to signal that the second transform is not to be used in both the horizontal direction and in the vertical direction:
parse a third flag of the at least one flag to determine in which of the horizontal direction or the vertical direction the second transform is to be used to decode the encoded video block and in which of the horizontal direction or the vertical direction a third transform is to be used to decode the encoded video block; and
decode the encoded video block using the second and third transforms to generate the decoded video block.
15. A computer program product comprising computer readable means (303) and a computer program according to claim 14, stored on the computer readable means.
16. A method performed by an encoder, the method comprising:
receiving a video block for encoding;
determining a characteristic of the video block;
responsive to the characteristic being of a type that indicates a multiple transform selection is used:
selecting a first transform in a plurality of transforms that is part of the multiple transform selection and that is one of most computationally expensive to use or least likely to be used in encoding the video block;
testing combinations of the plurality of transforms in a horizontal direction and a vertical direction without testing a combination where the first transform is used in both the horizontal direction and the vertical direction;
selecting a combination from the combinations that provides the lowest rate distortion;
encoding the video block using the selected combination to generate an encoded video block;
responsive to the characteristic being of a type that indicates a multiple transform selection is not to be used:
encoding the video block using a default transform in the horizontal direction and the vertical direction.
17. The method of claim 16 wherein selecting the first transform comprises selecting a transform that is similar to another transform of the plurality of transforms and is more computationally complex than the other transform of the plurality of transforms.
18. The method of any of claims 16-17 further comprising determining the plurality of transforms that are to be tested.
19. The method of claim 18 wherein the plurality of transforms comprises a Discrete Cosine Transformation, DCT-2 transform, a DCT-8 transform, and a Discrete Sine Transformation, DST -7 transform.
20. The method of any of claims 16-19 wherein the characteristic of the video block comprises a dimension of the video block.
21. The method of any of claims 16-19 wherein the characteristic of the video block comprises one of a block size and/or a block shape.
22. The method of any of claims 16-21 wherein the characteristic comprises a block size of the video block being of the form 32xN or Nx32 where N can assume the values 4, 8, 16 or 32 and testing combinations of the plurality of transforms in a horizontal direction and a vertical direction without testing a combination where the first transform is used in both the horizontal direction and the vertical direction comprises:
responsive to a block being of size 16x16 or smaller, evaluating combinations of DCT-2 horizontally and DCT-2 vertically, DST-7 horizontally and DST-7 vertically, DST-7 horizontally and DCT-8 vertically, and DCT-8 horizontally and DST-7 vertically; and
responsive to a block being of size 32xN or Nx32, where N can be 4, 8, 16 or 32, evaluating combinations of DCT-2 horizontally and DCT-2 vertically, DST-7 horizontally and DST-7 vertically, DST-7 horizontally and DCT-2 vertically, and DCT- 2 horizontally and DST-7 vertically.
23. An encoder for encoding a block of video based on a block size of the block, wherein each of a horizontal direction and a vertical direction of the block is encoded using a transform, wherein the transform can be one of a first transform, a second transform or a third transform, the encoder comprising:
at least one processor (201 );
memory (203) coupled to the processor, said memory comprising instructions executable by the processor, which cause the processor to perform operations comprising:
determining a characteristic of the video block;
responsive to the characteristic being of a type that indicates a multiple transform selection is used:
selecting a first transform in a plurality of transforms that is part of the multiple transform selection and that is one of most computationally expensive to use or least likely to be used in encoding the video block;
testing combinations of the plurality of transforms in a horizontal direction and a vertical direction without testing a combination where the first transform is used in both the horizontal direction and the vertical direction;
selecting a combination from the combinations that provides the lowest rate distortion;
encoding the video block using the selected combination to generate an encoded video block; responsive to the characteristic being of a type that indicates a multiple transform selection is not to be used:
encoding the video block using a default transform in the horizontal direction and the vertical direction.
24. The encoder of claim 23 wherein the plurality of transforms comprises a Discrete Cosine Transformation, DCT-2 transform, a DCT-8 transform, and a Discrete Sine Transformation, DST-7 transform.
25. A computer program for an encoder, the computer program comprising code means which, when run on a computer, causes the computer to:
determine a characteristic of the video block;
responsive to the characteristic being of a type that indicates that a multiple transform selection is used:
select a first transform in a plurality of transforms that is part of the multiple transform selection and that is one of most computationally expensive to use or least likely to be used in encoding the video block; test combinations of the plurality of transforms in a horizontal direction and a vertical direction without testing a combination where the first transform is used in both the horizontal direction and the vertical direction;
select a combination from the combinations that provides the lowest rate distortion;
encode the video block using the selected combination to generate an encoded video block;
transmit, to a decoder through a network, the encoded video block with flags to indicate the selected combination
responsive to the characteristic being of a type that indicates a multiple transform selection is not to be used:
encode the video block using a default transform in the horizontal direction and the vertical direction.
26. A computer program product comprising computer readable means (303) and a computer program according to claim 25, stored on the computer readable means.
PCT/SE2019/051206 2018-12-28 2019-11-28 Method and apparatus for selecting transform selection in an encoder and decoder WO2020139182A1 (en)

Priority Applications (10)

Application Number Priority Date Filing Date Title
US16/640,010 US11082692B2 (en) 2018-12-28 2019-11-28 Method and apparatus for selecting transform selection in an encoder and decoder
JP2021537996A JP7257523B2 (en) 2018-12-28 2019-11-28 Method and Apparatus for Selecting Transform Choices in Encoders and Decoders
KR1020217023848A KR20210104895A (en) 2018-12-28 2019-11-28 Method and apparatus for selecting transform selection in encoder and decoder
CN201980086072.3A CN113302923B (en) 2018-12-28 2019-11-28 Method and apparatus for selecting transform selection in encoder and decoder
MX2021007633A MX2021007633A (en) 2018-12-28 2019-11-28 Method and apparatus for selecting transform selection in an encoder and decoder.
EP19902876.2A EP3903487A4 (en) 2018-12-28 2019-11-28 Method and apparatus for selecting transform selection in an encoder and decoder
RU2021122027A RU2767513C1 (en) 2018-12-28 2019-11-28 Method and equipment for performing transformation selection in encoder and decoder
US17/360,088 US11558613B2 (en) 2018-12-28 2021-06-28 Method and apparatus for selecting transform selection in an encoder and decoder
CONC2021/0009769A CO2021009769A2 (en) 2018-12-28 2021-07-26 Selection method and apparatus for applying transform in an encoder and decoder
US18/077,414 US11991359B2 (en) 2018-12-28 2022-12-08 Method and apparatus for selecting transform selection in an encoder and decoder

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201862785856P 2018-12-28 2018-12-28
US62/785,856 2018-12-28

Related Child Applications (2)

Application Number Title Priority Date Filing Date
US16/640,010 A-371-Of-International US11082692B2 (en) 2018-12-28 2019-11-28 Method and apparatus for selecting transform selection in an encoder and decoder
US17/360,088 Continuation US11558613B2 (en) 2018-12-28 2021-06-28 Method and apparatus for selecting transform selection in an encoder and decoder

Publications (1)

Publication Number Publication Date
WO2020139182A1 true WO2020139182A1 (en) 2020-07-02

Family

ID=71129247

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SE2019/051206 WO2020139182A1 (en) 2018-12-28 2019-11-28 Method and apparatus for selecting transform selection in an encoder and decoder

Country Status (9)

Country Link
US (3) US11082692B2 (en)
EP (1) EP3903487A4 (en)
JP (1) JP7257523B2 (en)
KR (1) KR20210104895A (en)
CN (1) CN113302923B (en)
CO (1) CO2021009769A2 (en)
MX (1) MX2021007633A (en)
RU (1) RU2767513C1 (en)
WO (1) WO2020139182A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3691270A4 (en) * 2017-09-28 2021-06-02 Samsung Electronics Co., Ltd. Encoding method and device, and decoding method and device
JP2021529462A (en) * 2018-06-29 2021-10-28 ヴィド スケール インコーポレイテッド Selection of adaptive control points for video coding based on affine motion model
MX2021016152A (en) * 2019-06-19 2022-02-22 Lg Electronics Inc Signaling of information indicating transform kernel set in image coding.

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130188699A1 (en) * 2012-01-22 2013-07-25 Qualcomm Incorporated Coding of coefficients in video coding
US20130272378A1 (en) * 2012-04-16 2013-10-17 Qualcomm Incorporated Coefficient groups and coefficient coding for coefficient scans
US20160219290A1 (en) * 2015-01-26 2016-07-28 Qualcomm Incorporated Enhanced multiple transforms for prediction residual
US20170251224A1 (en) * 2014-06-20 2017-08-31 Samsung Electronics Co., Ltd. Method and device for transmitting prediction mode of depth image for interlayer video encoding and decoding
US20180262763A1 (en) * 2017-03-10 2018-09-13 Qualcomm Incorporated Intra filtering flag in video coding
US20180332289A1 (en) * 2017-05-11 2018-11-15 Mediatek Inc. Method and Apparatus of Adaptive Multiple Transforms for Video Coding
WO2019026807A1 (en) * 2017-08-03 2019-02-07 Sharp Kabushiki Kaisha Systems and methods for partitioning video blocks in an inter prediction slice of video data
US20190306521A1 (en) * 2018-03-29 2019-10-03 Tencent America LLC Transform information prediction
WO2019230670A1 (en) * 2018-05-31 2019-12-05 Sharp Kabushiki Kaisha Systems and methods for partitioning video blocks in an inter prediction slice of video data

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101805531B1 (en) 2009-06-07 2018-01-10 엘지전자 주식회사 A method and an apparatus for decoding a video signal
KR101753273B1 (en) * 2010-03-10 2017-07-03 톰슨 라이센싱 Methods and apparatus for constrained transforms for video coding and decoding having transform selection
US10362316B2 (en) 2010-04-01 2019-07-23 Sony Corporation Image processing device and method
US8976861B2 (en) 2010-12-03 2015-03-10 Qualcomm Incorporated Separately coding the position of a last significant coefficient of a video block in video coding
US9042440B2 (en) 2010-12-03 2015-05-26 Qualcomm Incorporated Coding the position of a last significant coefficient within a video block based on a scanning order for the block in video coding
US20120163448A1 (en) 2010-12-22 2012-06-28 Qualcomm Incorporated Coding the position of a last significant coefficient of a video block in video coding
US20120163472A1 (en) 2010-12-22 2012-06-28 Qualcomm Incorporated Efficiently coding scanning order information for a video block in video coding
MX2013013483A (en) 2011-06-27 2014-02-27 Panasonic Corp Image decoding method, image encoding method, image decoding device, image encoding device, and image encoding/decoding device.
US9210438B2 (en) * 2012-01-20 2015-12-08 Sony Corporation Logical intra mode naming in HEVC video coding
US10257520B2 (en) 2012-06-26 2019-04-09 Velos Media, Llc Modified coding for transform skipping
EP3262837A4 (en) * 2015-02-25 2018-02-28 Telefonaktiebolaget LM Ericsson (publ) Encoding and decoding of inter pictures in a video
WO2017138791A1 (en) * 2016-02-12 2017-08-17 삼성전자 주식회사 Image encoding method and apparatus, and image decoding method and apparatus
US10972733B2 (en) * 2016-07-15 2021-04-06 Qualcomm Incorporated Look-up table for enhanced multiple transform
WO2018128222A1 (en) * 2017-01-03 2018-07-12 엘지전자 주식회사 Method and apparatus for image decoding in image coding system
US10805641B2 (en) 2017-06-15 2020-10-13 Qualcomm Incorporated Intra filtering applied together with transform processing in video coding
US11134272B2 (en) * 2017-06-29 2021-09-28 Qualcomm Incorporated Memory reduction for non-separable transforms
JP6863208B2 (en) * 2017-09-29 2021-04-21 株式会社ニューフレアテクノロジー Multi-charged particle beam drawing device and multi-charged particle beam drawing method
WO2020046091A1 (en) * 2018-09-02 2020-03-05 엘지전자 주식회사 Image coding method based on multiple transform selection and device therefor
EP4152748A1 (en) * 2018-09-02 2023-03-22 LG Electronics, Inc. Method and apparatus for processing image signal
US10819979B2 (en) * 2018-09-06 2020-10-27 Tencent America LLC Coupled primary and secondary transform

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130188699A1 (en) * 2012-01-22 2013-07-25 Qualcomm Incorporated Coding of coefficients in video coding
US20130272378A1 (en) * 2012-04-16 2013-10-17 Qualcomm Incorporated Coefficient groups and coefficient coding for coefficient scans
US20170251224A1 (en) * 2014-06-20 2017-08-31 Samsung Electronics Co., Ltd. Method and device for transmitting prediction mode of depth image for interlayer video encoding and decoding
US20160219290A1 (en) * 2015-01-26 2016-07-28 Qualcomm Incorporated Enhanced multiple transforms for prediction residual
US20180262763A1 (en) * 2017-03-10 2018-09-13 Qualcomm Incorporated Intra filtering flag in video coding
US20180332289A1 (en) * 2017-05-11 2018-11-15 Mediatek Inc. Method and Apparatus of Adaptive Multiple Transforms for Video Coding
WO2019026807A1 (en) * 2017-08-03 2019-02-07 Sharp Kabushiki Kaisha Systems and methods for partitioning video blocks in an inter prediction slice of video data
US20190306521A1 (en) * 2018-03-29 2019-10-03 Tencent America LLC Transform information prediction
WO2019230670A1 (en) * 2018-05-31 2019-12-05 Sharp Kabushiki Kaisha Systems and methods for partitioning video blocks in an inter prediction slice of video data

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
CHRISTOPHER HOLLMANN ET AL.: "CE-6 related: Transform Simplification", OINT VIDEO EXPERTS TEAM (JVET) OF ITU-T SG 16 WP 3 AND ISO/IEC JTC 1/SC 29/WG 11 13TH MEETING, Marrakech, MA *
CHRISTOPHER HOLLMANN ET AL.: "CE6: Transform Simplification (CE6-2.3a-c", OINT VIDEO EXPERTS TEAM (JVET) OF ITU-T SG 16 WP 3 AND ISO/IEC JTC 1/SC 29/WG 11 14TH MEETING, 27 March 2019 (2019-03-27), Geneva, CH, XP030203137 *
CHRISTOPHER HOLLMANN ET AL.: "CE6-related: Transform Candidate Ordering", OINT VIDEO EXPERTS TEAM (JVET) OF ITU-T SG 16 WP 3 AND ISO/IEC JTC 1/SC 29/WG 11 14TH MEETING, 19 March 2019 (2019-03-19), Geneva, CH, XP030203139 *
CHRISTOPHER HOLLMANN ET AL.: "CE6-related: Transform Simplification", OINT VIDEO EXPERTS TEAM (JVET) OF ITU-T SG 16 WP 3 AND ISO/IEC JTC 1/SC 29/WG 11 12TH MEETING, 3 October 2018 (2018-10-03), Macau, CN, XP030190485 *
CHRISTOPHER HOLLMANN ET AL.: "Non-CE6: Reduced MTS", OINT VIDEO EXPERTS TEAM (JVET) OF ITU-T SG 16 WP 3 AND ISO/IEC JTC 1/SC 29/WG 11 15TH MEETING, 3 July 2019 (2019-07-03), Gothenburg, SE, XP030219599 *
JANI LAINEMA: "CE6-related: Shape adaptive transform selection", OINT VIDEO EXPERTS TEAM (JVET) OF ITU-T SG 16 WP 3 AND ISO/IEC JTC 1/SC 29/WG 11 12TH MEETING, 3 October 2018 (2018-10-03), Macao, CN, XP030192975 *
MISCHA SIEKMANN ET AL.: "CE6 - related: ''Set of Transforms", OINT VIDEO EXPERTS TEAM (JVET) OF ITU-T SG 16 WP 3 AND ISO/IEC JTC 1/SC 29/WG 11 11TH MEETING: LJUBLJANA, SI, 18 July 2018 (2018-07-18), XP030195862 *
See also references of EP3903487A4 *
XIN ZHAO ET AL.: "CE6-3.1: Coupled primary and secondary transform", OINT VIDEO EXPERTS TEAM (JVET) OF ITU-T SG 16 WP 3 AND ISO/IEC JTC 1/SC 29/WG 11 11TH MEETING, 10 July 2018 (2018-07-10), Ljubljana, SI, XP030191024 *

Also Published As

Publication number Publication date
US11082692B2 (en) 2021-08-03
CN113302923A (en) 2021-08-24
CN113302923B (en) 2024-04-02
JP2022516497A (en) 2022-02-28
US20230109113A1 (en) 2023-04-06
CO2021009769A2 (en) 2021-08-09
US20210136376A1 (en) 2021-05-06
KR20210104895A (en) 2021-08-25
MX2021007633A (en) 2021-08-11
US11558613B2 (en) 2023-01-17
US20210329243A1 (en) 2021-10-21
JP7257523B2 (en) 2023-04-13
RU2767513C1 (en) 2022-03-17
EP3903487A4 (en) 2022-09-21
US11991359B2 (en) 2024-05-21
EP3903487A1 (en) 2021-11-03

Similar Documents

Publication Publication Date Title
US10687058B2 (en) Method and apparatus for coding of intra prediction mode
US11991359B2 (en) Method and apparatus for selecting transform selection in an encoder and decoder
CN113841409B (en) Conditional use of simplified quadratic transforms for video processing
US9161046B2 (en) Determining quantization parameters for deblocking filtering for video coding
KR101356733B1 (en) Method and apparatus for Context Adaptive Binary Arithmetic Coding and decoding
US10743031B2 (en) Method and apparatus for syntax redundancy removal in palette coding
AU2018251489B2 (en) Method and device for entropy encoding, decoding video signal
CN113711597B (en) Context modeling and selection of multiple transformation matrices
CN103210647A (en) Method and apparatus of delta quantization parameter processing for high efficiency video coding
US11483562B2 (en) Method and apparatus for video encoding and decoding based on context switching
US11695962B2 (en) Encoding and decoding methods and corresponding devices
EP3306924A1 (en) Method and device for context-adaptive binary arithmetic coding a sequence of binary symbols representing a syntax element related to picture data
US11997280B2 (en) Use-case driven context model selection for hybrid video coding tools
TWI789668B (en) Determining a parametrization for context-adaptive binary arithmetic coding
EP2391133A1 (en) Encoding/decoding method and device based on double prediction
CN108702521B (en) Encoding and decoding method, apparatus, encoder, decoder and storage medium
US11743467B2 (en) Method and device for entropy encoding coefficient level, and method and device for entropy decoding coefficient level
US20230291922A1 (en) Encoding and decoding methods and corresponding devices
CN113225567A (en) Method and device for encoding and decoding residual error coefficients of video data and electronic equipment
WO2021207035A1 (en) Methods and apparatus on transform and coefficient signaling

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19902876

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021537996

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 20217023848

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2019902876

Country of ref document: EP

Effective date: 20210728